VMware Cloud Community
ftoppi
Contributor
Contributor
Jump to solution

Recover from all path down / permanent device loss in 5.0 cluster

Hello,

I am facing an unplanned permanent device loss in a ESXi 5.0 cluster .

Consequently to this outage, all the ESXi servers in the cluster are now disconnected from their vCenter and so, I cannot manage them nor make backups.

Also, I cannot connect to ESXi servers with vSphere client nor HTTPS.

Basically, a shared storage in iSCSI suddenly died. Fortunately, it was not used for virtual machines. Unfortunately, it was still mounted and used for datastore heartbeats.

Here is the output of esxcfg-mpath -b , it shows the only path is dead:

eui.68786f5457513978 : SCST_BIO iSCSI Disk (eui.68786f5457513978)
   vmhba39:C0:T1:L0 LUN:0 state:dead iscsi Adapter: Unavailable Target: Unavailable

The virtual machines are still running fine on another datastore.

There are 2 other datastores used for heartbeating.

I am still able to log in with SSH on the ESXi servers.

I found this article to recover but it requires access to vSphere client, which is not working.

I also found a blog post saying a reboot is required, which I am reluctant to do (it would affect virtual machines).

Is there a way to recover without affecting the virtual machines ?

Any help will be appreciated.

Reply
0 Kudos
1 Solution

Accepted Solutions
VirtuallyMikeB
Jump to solution

On the storage system, unmask the dead LUNs.  The follow the CLI steps in this article to remove the dead paths.  From there, restart your management services and try to access with the vSphere/Web Client.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100398...

----------------------------------------- Please consider marking this answer "correct" or "helpful" if you found it useful (you'll get points too). Mike Brown VMware, Cisco Data Center, and NetApp dude Sr. Systems Engineer michael.b.brown3@gmail.com Twitter: @VirtuallyMikeB Blog: http://VirtuallyMikeBrown.com LinkedIn: http://LinkedIn.com/in/michaelbbrown

View solution in original post

Reply
0 Kudos
2 Replies
VirtuallyMikeB
Jump to solution

On the storage system, unmask the dead LUNs.  The follow the CLI steps in this article to remove the dead paths.  From there, restart your management services and try to access with the vSphere/Web Client.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100398...

----------------------------------------- Please consider marking this answer "correct" or "helpful" if you found it useful (you'll get points too). Mike Brown VMware, Cisco Data Center, and NetApp dude Sr. Systems Engineer michael.b.brown3@gmail.com Twitter: @VirtuallyMikeB Blog: http://VirtuallyMikeBrown.com LinkedIn: http://LinkedIn.com/in/michaelbbrown
Reply
0 Kudos
ftoppi
Contributor
Contributor
Jump to solution

Hello.

OK, reviving the dead storage did it. Took me quite some time, it was in pretty bad condition.

Restarting the management services was not even required, the hosts became reachable shortly after the iSCSI target was revived.

Thanks !

Reply
0 Kudos