VMware Cloud Community
conradsia
Hot Shot
Hot Shot

VMotion stuck in progress

Hello,

I recently had a vmotion get stuck in progress for over an hour when the high cpu resources of a host caused the esx host to become unresponsive to VC. I tried restarting the vc services, I rebooted VC, and I removed and re-connected the host from vc, with no luck. After all of that the vmotion was still saying in progress and the vm itself began to have resource issues.

I had to migrate all of the vm's off of the host, kill the one running vm that was stuck agressively, and then reboot the server to finally orphan the migration and bring the server back up.

VMware told me that the issue was with resource group reservations and that they really shouldn't be set, the reservations were causing the cpu resources to not be allocated efficiently on the esx host.

Has anyone else seen a migration get stuck and not be able to kill it? OR if you did have a stuck migration and were able to kill it, could you help me out and let me know what you did you stop it. I restarted the vc service and stopped and started the vmware-vpxa service as well as disconnected the host from the vc server all with no luck. The only way I could get the process to orphan was to reboot.

BTW, While the migration was stuck I was able to power off the vm but could no longer power it back on.

Thanks,

Reply
0 Kudos
3 Replies
Paul_Lalonde
Commander
Commander

The inability to power the VM back on is usually due to one of two things:

1) There's an active lock on the VM, particularly the vswp swapfile.

2) There's an active process on either the origin or destination ESX server (the two servers involved in the VMotion exchange) that's holding the .vmx and/or virtual machine open.

You can usually do a:

vmware-cmd /full/path/to/vmx/file.vmx getstate

vmware-cmd /full/path/to/vmx/file.vmx stop hard

to figure out what's happening.

Paul

conradsia
Hot Shot
Hot Shot

I figured there was a lock on something which is why I couldn't power it on and that's fine. But, I couldn't kill the migration in progress even after stopping the vm, restarting the vc service and server, restarting the mgmt-vmware service on the esx servers, restarting the vmware-vpxa service, removing the server from vc ...

I only removed one of the servers maybe I should have tried removing them both?

Does anyone know what process handles the vmotion in progress?

Reply
0 Kudos
Texiwill
Leadership
Leadership

Hello,

A vMotion that gets stuck in the middle could be related to a SCSI Reservation Conflict. YOu will want to review your /var/log/vmkernel and /var/log/vmkwarning files for these types of issues. These can set locks on remote data stores that then need to be cleared up.

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill