VMware Cloud Community
NuggetGTR
VMware Employee
VMware Employee

Venting some anger over lost storage

Now im sure we VMware admins get it all the time, but its "always" a virtual issue these days. I remember back in the day when I was a wintel engineer and it was always a network issue.. haha how things come around.

Anyway ive had a hell of a day trying to convince people an issue we experianced was not a ESXi or hardware issue one my side.

Just a little background info

The environment I look after is quiet large 200 hosts cut up into different clusters generally by environement but some environments in particular 1 has 3 clusters or 24 hosts.

So I see my fair share of issues, the environment is basically identical all the servers are the same model and specs and all have the same firmware on all components and all are running ESXi5, lower levels user clarions FC storage higher levels use Sym FC storage.

We had a issue I got called in for on a relaxing weekend off. there is 7 hosts in this cluster all 7 hosts are presented with the same 54 Sym LUNs which make 54 Datastores. Now 1 datastore dropped off all 7 hosts the same datastore, everything else was fine. I cant get it through the people that manage our storage that it had to have been something on the storage end either SAN or fabric something as I cant see any other way for one datastore to go down to 7 hosts, vmkernal logs even go balistic with APD for this datastore on all 7 hosts.

they are admiment that there logs are clean, now I cant think of anyway this could have been caused at my end, does anyone know if something on the ESXi end could cause a situation like this?

Cheers

________________________________________ Blog: http://virtualiseme.net.au VCDX #201 Author of Mastering vRealize Operations Manager
0 Kudos
4 Replies
beckham007fifa

okie, how did the problem resoved? Could you post your vmkernel and summary logs here when the issue occured....

there is possibility that it because of host and well as storage issue, whats the speed of your FC and how many paths you have?

Regards, ABFS
0 Kudos
NuggetGTR
VMware Employee
VMware Employee

AB wrote:

okie, how did the problem resoved? Could you post your vmkernel and summary logs here when the issue occured....

there is possibility that it because of host and well as storage issue, whats the speed of your FC and how many paths you have?

It just started working 3 days later,

Unfortunately I cant post logs being on a secure site and environment. the logs were throwing up sense data saying the LUN was data protected and write protected.

we have dual 4G FC HBAs running in round robin

Ive already got VMware support involved because they never trust us and need the comfort of something officall. and they have said exactly what I was saying.

but just curious what on the host side if anything stop 7 hosts accessing the one VMFS volume?  is it possible for a host to lock and entire VMFS volume? but if that was the case at least one host would have been fine. meh...

________________________________________ Blog: http://virtualiseme.net.au VCDX #201 Author of Mastering vRealize Operations Manager
0 Kudos
marcelo_soares
Champion
Champion

If you had APD's, this was generated by a storage connectivity/presentation issue (maybe accidentally, maybe on purpose). If you have an old verion of ESX (i.e 4.0 U1 or lower, or 4.1) you will experience problems even if an APD condition is cleared (paths restored). Your SAN team need to investigate it further. It is very unusual that an action made on the ESX could have generated such an issue.

Marcelo Soares
0 Kudos
beckham007fifa

yeah, absolutely APD is not to come due to host and with vsphere 5.0 these error message in the logs are well and precisly defind for knowing the exact cause. Check if there was some activity going on from storage team at their end maybe they would not have planned for your storage but by any mean/unknowgly caused APD for 7 of your hosts.

Also there are codes with the errors, what I know if you have 0xH in the code then there is no issue from the host side.

Regards, ABFS
0 Kudos