We have about 15 ESXi servers in a cluster and they have 600+ VDIs. The vmotion IP pings to each other. But somehow the vmotion fails at 21%.
The port 8000 is working. The NICs link are up. I am not sure why the vmotion fails. If we power off the VDI and migrate, it works. When powered on, it doesn't work.
When powered off it is not technically a "vmotion", it is a cold migration.
When you test the ping, you use vmkping and select the correct interface to validate networking between the interfaces?
I ran into something like that recently. A vmotion powered off went over my mgmt interface and a vmotion powered on went through the vmotion interfaces. I had a MTU mismatch that prevented the live vmotion from working.
I can ping with vmkping -I vmk1 "ip". It does ping. But somehow the vmotion fails at 21%.
Also noted that the dvswitch is set at 1500, vmotion kernel is at 1500. Whereas the network switch level is 9000.
maybe is it because of the mismatch between dvswitch and network switch?
If you storage migrate from 9k MTU to 1500 mtu it will fail. What you can do for a test is shut a VM down and do the migration. Once it is migrated power on the VM. Now you should be able to vmotion from 1500 MTU back to the 9K mtu if it is indeed mtu getting in the way.
As @depping already wrote, a Powered off VM doesn’t leverage vMotion.
Do you get some other error detail?
I would check again all the vMotion network configuration. Especially:
Is there some log in the Virtual machine log (vmware.log)?
Thanks for the inputs everyone. I noticed that if the ESXi management agents is restarted, the vmotion works fine. Also, from the vmkernel.log, I can see the below error:
"Failed: The ESX hosts failed to connect over the VMotion network
Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error".
From the ESXi server, the migration timeout is at 20. For other clusters ESXi servers too the same value but vmotion is working for them.
Somehow for all the hosts in this problem cluster, vmotion ip pings but it fails at 21%.