Longford
Contributor
Contributor

Recover VM from dead host

Jump to solution

I am running ESX 3.02 with virtual centre with three hosts. HA/DRS and vmotion are enabled and I am able to migrate VM's from host to host.

Today I lost a host and one virtual machine migrated but one did it. THE VC is reporting that the VM is powered on but disconnected and is associated to the host that failed.

Can someone advise how I recover my VM.

0 Kudos
1 Solution

Accepted Solutions
ExCon
Enthusiast
Enthusiast

Is the host you lost totally dead? It seems like if it was totally dead, you should be able to right-click on the VM and remove it from inventory. Then you should be able to browse the datastore where the VM lives, and then right-click on it and add it to inventory. Not sure if that last part is only 3.5, though... If you can't add to inventory from the datastore, log on to one of your surviving ESX boxes and register the VM using vmware-cmd. Not sure of the syntax off the top of my head, but I think it's something like "vmware-cmd -s register /vmfs/volumes/datastore/yourvm.vmx"

View solution in original post

0 Kudos
5 Replies
ExCon
Enthusiast
Enthusiast

Is the host you lost totally dead? It seems like if it was totally dead, you should be able to right-click on the VM and remove it from inventory. Then you should be able to browse the datastore where the VM lives, and then right-click on it and add it to inventory. Not sure if that last part is only 3.5, though... If you can't add to inventory from the datastore, log on to one of your surviving ESX boxes and register the VM using vmware-cmd. Not sure of the syntax off the top of my head, but I think it's something like "vmware-cmd -s register /vmfs/volumes/datastore/yourvm.vmx"

View solution in original post

0 Kudos
depping
Leadership
Leadership

restart the virtual center service will probably solve this. remove the vm from the inventory, and add it to the inventory via " browse datastore" and start it.






Duncan

My virtualisation blog:

http://www.yellow-bricks.com

Leifster
Contributor
Contributor

Just ran into a similar situation, and this was helpful information. Thanks!

Only other thing I had to do was to completely remove the host from virtual center.

0 Kudos
kcollo
Contributor
Contributor

I agree with the restarting of virtual center option. I have had this issue before, but not with a dead ESX host, just a hung virtual machine. Below is a link to a blog showing how it can be stopped/restarted from the command line if you had access to the server that was running the hung vm. Not sure if it helps in this scenario, guess that is dependent on how "dead" the ESX host is that you lost.

http://blog.colovirt.com/2009/02/09/vmware-esx-restart-a-hung-virtual-machine/

--

Kevin Goodman

Linux / SAN / Virtualization

kevin@colovirt.com

http://blog.colovirt.com

0 Kudos
Leifster
Contributor
Contributor

Great resource, thanks!

In our particular case we found a host seemingly disconnected generating a host connection state alarm because the HA agent in the cluster had an error. The host and all VMs showed up as if they were disconnected, but we could remotely (RDP, etc.) access all of the VMs. We couldn't SSH into the console, but we could ping it. We couldn't log in at the ESX terminal itself. For the local terminal, a login prompt would display but no authentication would take place. IIRC, we couldn't even enter a password.

We had a failed disk in the host. My co-worker had set it up as a RAID5 with a hot spare, but something must have really hosed up the config, because when I looked at the RAID BIOS it displayed a RAID 0 config, with a failed disk, and a hot spare that was basically useless since the adapter treated the array as RAID 0. Not sure if this was a mistake in initial setup or some kind of bad corruption.

Since we could remote into the VMs, we were able to shut them down gracefully. After that, we rebooted the host and since it had a RAID 0 array with a failed disk, there was no ESX to load. I guess we had essentially been running ESX off memory when that disk failure occurred. Pretty amazing that those VMs kept chugging along.

After the host rebooted, HA was able to detect a downed host and migrated all VMs to the other online hosts. HA brought them all back online, since its last known state of the VMs was that they were online (almost like zombie VMs).

We had one issue, however, which brought me to this thread. I missed shutting down 1 VM before the failed host reboot. So VC thought this VM was running, disconnected on the failed host and I couldn't migrate it. In reality, it had hard crashed when I rebooted the failed host and was offline.

After removing the host from VC, the VM was also removed from VC and I was able to add it back to the inventory per this thread.

0 Kudos