VMware Cloud Community
bhwong7
Contributor
Contributor

vMotion freeze with all VM options gray off

Motion freeze at 67% when migrating a VM  from ESXi 4.1u2 to ESXi 5.5u1. This is the 5th VM to be vMotion (the previous 4 VMs have vMotioned smoothly)


Cancel vMotion does nothing. All options gray off, including power off and remove. Error messages:

  1. vSphere HA cannot reset this virtual machine
  2. Failed to lock the file.
  3. An error occurred while restarting virtual machine after taking a snapshot. The virtual machine will be powered off. (Why does vSphere take a snapshot?)


This is what I have done:

  1. Login to ESXi 4.1u2 directly and see that this VM already show powered off status. Remove VM
  2. Login to ESXi 5.5u1 directly and see that this VM is also there with powered off status. All option gray off. Unable to do anything on it.
  3. SSH to ESXi 5.5u1 directly and execute kill process: esxcli vm process kill --type=soft --world-id=260483
  4. Power up VM

Question:

  1. What is the possible cause for the vMotion to fail when all the previous VMs have migrated successfully?
  2. Why does vCenter attempt to take a snapshot and restart this VM for? Is the system account named user?
  3. Why can't vMotion just abort the migration and continue to run on the existing ESXi 4.1u2 when it encounter problem, instead of just freeze there infinity?
  4. Is there a better way to recover the VM without killing it's process and power up?
0 Kudos
1 Reply
vNEX
Expert
Expert

Hi Bhwong,


these errors are always quite complex, can you gather more detailed information, take a look at these log files on both (source/destination) hosts:

/var/log/hostd.log;

/var/log/vmkwarning.log

and

vmware.log of failed VM

Q:What is the possible cause for the vMotion to fail when all the previous VMs have migrated successfully?

A: What is in general the difference between successfully migrated VMs and failed VM?

Check VM HW configuration: - HW version, CPUID masks, Under VM options tab, are you using some of the specific  Advanced settings?

Is there any difference in datastore/disks types (RDM?) which is used by failed and success VM ? Have you tried Storage vMotion VM to another datastore and then vMotion to different host?

It is the VM with existing snapshots?

What OS is installed inside VM?

Is there any difference in source and destination hosts/builds compared to successfully migrated VMs?

Q:Why does vCenter attempt to take a snapshot and restart this VM for? Is the system account named user?

A: vSphere HA reset was triggered because you have enabled VM Monitoring feature/service. When VM stops sending its heartbeats via VMwareTools and there is no I/O activity (Disk/Network)

for the VM on the host vSphere HA will try to reset VM i defined intervals...(this symptoms is typical for frozen VM 😉

"An error occurred while restarting virtual machine after taking a snapshot. The virtual machine will be powered off." - this sentence could be misleading, better info will be in logs I mentioned above.

Q: Why can't vMotion just abort the migration and continue to run on the existing ESXi 4.1u2 when it encounter problem, instead of just freeze there infinity

A: My guess is that something goes wrong on source host side which cause VM to freeze, if VM is frozen/non-responsive there is no recovery that can be taken by vMotion.

If something wrong happen on the destination host vMotion usually successfully reverts back whole operation and VM continue running on source host.

Q:Is there a better way to recover the VM without killing it's process and power up?

I don’t think so...

Regards,

Petr

_________________________________________________________________________________________ If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons) Regards, P.
0 Kudos