VMware Cloud Community
Janani0711
Contributor
Contributor

vmotion IP pings but vmotion doesnt work

Hi,

We have about 15 ESXi servers in a cluster and they have 600+ VDIs. The vmotion IP pings to each other. But somehow the vmotion fails at 21%.

The port 8000 is working. The NICs link are up. I am not sure why the vmotion fails. If we power off the VDI and migrate, it works. When powered on, it doesn't work.

0 Kudos
7 Replies
depping
Leadership
Leadership

When powered off it is not technically a "vmotion", it is a cold migration. 

When you test the ping, you use vmkping and select the correct interface to validate networking between the interfaces?

alantz
Enthusiast
Enthusiast

I ran into something like that recently. A vmotion powered off went over my mgmt interface and a vmotion powered on went through the vmotion interfaces. I had a MTU mismatch that prevented the live vmotion from working. 

--Alan--

 

 

0 Kudos
Janani0711
Contributor
Contributor

I can ping with vmkping -I vmk1  "ip". It does ping. But somehow the vmotion fails at 21%.

Also noted that the dvswitch is set at 1500, vmotion kernel is at 1500. Whereas the network switch level is 9000.

maybe is it because of the mismatch between dvswitch and network switch?

0 Kudos
alantz
Enthusiast
Enthusiast

If you storage migrate from 9k MTU to 1500 mtu it will fail. What you can do for a test is shut a VM down and do the migration.  Once it is migrated power on the VM. Now you should be able to vmotion from 1500 MTU back to the 9K mtu if it is indeed mtu getting in the way.

--Alan--

 

 

0 Kudos
irigoyen
Enthusiast
Enthusiast

As @depping already wrote, a Powered off VM doesn’t leverage vMotion.

Do you get some other error detail?

I would check again all the vMotion network configuration. Especially:

  • Are the vMotion on the same subnet?
  • Are the IP correct configured?
  • Subnetmask?
  • Is the VM on a shared Storage?
  • Failover order active/passive is correct?

Is there some log in the Virtual machine log (vmware.log)?

0 Kudos
larstr
Champion
Champion

Also check your vmkernel.log from the source and destination ESXi hosts while initiating a vmotion.

 

Lars

0 Kudos
Janani0711
Contributor
Contributor

Thanks for the inputs everyone. I noticed that if the ESXi management agents is restarted, the vmotion works fine. Also, from the vmkernel.log, I can see the below error:

"Failed: The ESX hosts failed to connect over the VMotion network

Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error".

From the ESXi server, the migration timeout is at 20. For other clusters ESXi servers too the same value but vmotion is working for them.

Somehow for all the hosts in this problem cluster, vmotion ip pings but it fails at 21%.

 

0 Kudos