FredPeterson
Expert
Expert

SAN disk stripped away - ESX 4.1 hosts unresponsive

So we had a boo-boo in our DR datacenter last night, one of the powersupplies in the single FC switch was bad so when power work was done the FC switch went kaput.

Shouldn't have been a big deal, all my VMs go down etc blah blah.

Funny thing though, is that technically none of the VMs went down - and I'm talking for TWO HOURS of having no disk connected.  Pretty cool things stayed alive, but very strange.  Double weird that sometimes the guests would respond to pings....

ANYWAY - as a result of this, for some reason my two hosts went unresponsive.  vCenter was running on local disk on one of the hosts fortunately, so that access was never affected - but the hosts dropped off otherwise.  The second the SAN disk came back online both hosts rejoined vCenter happy as a clam as if nothing happened.

Any ideas?  To be honest I've not tried to dig through logs to figure out why this happened.  I'm guessing its related to the fact that the VMs never actually went down so the VMKernel was trying hopelessly to talk to the disk.  I was looking at TOP as this was going on and nothing stood out....

0 Kudos
3 Replies
DSTAVERT
Immortal
Immortal

Most OSs can tollerate an outage if there isn't much disk activity. Some disk writes will be cached in RAM untill the disk becomes available. I have had similar experiences and some ride through without issue. I would run chkdsk fsck or whatever to make sure the disks are really OK.

-- David -- VMware Communities Moderator
0 Kudos
idle-jam
Immortal
Immortal

i too have the same experience above. also i would look at the VM os end and see if there is nothing funny on the event log, do a reboot and make sure that the application can start and i would consider case close.

0 Kudos
AndreTheGiant
Immortal
Immortal

After a storage issue, I suggest to check your OS, to see if disks are fine.

In similar cases I usually have problem with Windows Server 2008 and R2, where disks goes offline and you must manually put them back online.

Also some problem with Linux VMs, but in this case a reboot will fix back the disk operations.

Andre

Andre | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos