yeayu
Enthusiast
Enthusiast

vMotion fails at 14% between some ESXi hosts part of the same cluster

Hello all,

I have been looking into an issue that is happening only in a couple of ESXi hosts part of a cluster.

Any vMotion migration from other hosts to these ones fail at 14%, with the following message:

WARNING: MigrateNet: 1309: 1458908406025172 S: failed to connect to remote host <x.x.x.x> from host <y.y.y.y>: Timeout

WARNING: Migrate: 269: 1458908406025172 S: Failed: The ESX hosts failed to connect over the VMotion network (0xbad010b) @0x41802ba56f9a

I checked the configuration and compare the values with other working ESXi hosts as the following KB article describes:

VMware KB: Performing vMotion fails at 14% despite vmkping succeeding from source to target IP addre...

The MTU, VMkernel settings, LAN settings, route table and so on looks identical to some other hosts working part of the same cluster.

I can even ping successfully the hosts through the vMotion network using the vmk interface configured.

I have been comparing the VMKernel logs performing the migration from different ESXi hosts to identify differences and I spotted the following:

- Between two ESXi hosts where vMotion works correctly:

Migrate: vm 747618: 3286: Setting VMOTION info: Dest ts = AAAAAAAAAAAA, src ip = <x.x.x.x> dest ip = <x.x.x.z> Dest wid = 0 using SHARED swap

SRC and DST IP addresses belong to the same LAN, which (ironically) are not part of the vMotion network at all, but the management one.

- Between two ESXi hosts where vMotion does not work:

Migrate: vm 727726: 3286: Setting VMOTION info: Dest ts = AAAAAAAAAAAA, src ip = <x.x.x.x> dest ip = <y.y.y.y> Dest wid = 0 using SHARED swap

SRC and DST IP addresses belong to the different LANs: SRC is the Management network and DST the vMotion one.

I am running out of ideas, does anyone know why I am seeing these differences?

Any help would be much appreciated.

0 Kudos
5 Replies
homerzzz
Hot Shot
Hot Shot

Have you compared traceroute to the vmotion network from both good hosts and bad hosts to see what vmk the host is using and if it differs from other hosts?

Which vmk on each host is on your vmotion vlan and are they the only vmk's that have the "Use this adapter for Vmotion" setting checked? Maybe post screenshots of your settings so other eyes can have a look.

You don't by chance have any custom TCP/IP stacks configured on any of the hosts?

If the hosts are connected to different network switches, I would make sure the switch configurations are correct.

0 Kudos
rcporto
Leadership
Leadership

If you have a dedicated VMkernel interface just for vMotion, make sure no other VMkernel interface have the vMotion traffic enabled/selected.

---

Richardson Porto
Senior Infrastructure Specialist
LinkedIn: http://linkedin.com/in/richardsonporto
0 Kudos
yeayu
Enthusiast
Enthusiast

Yes, I tested that...traceroute for the vMotion network shows the same info across the different hosts (1 hop only)

I have several vmks in each host, only one them is selected for vMotion...

It's true though I have a some vmk interfaces in the vMotion network, that are not used for vMotion but iSCSI. These do not have the vMotion checkbox selected (i checked several times...), but the iSCSI port binding option.

No, i don't use custom TCP/IP stacks...

The hosts are connected to the same dvSwitch and are part of the same port group.

0 Kudos
yeayu
Enthusiast
Enthusiast

Yes, there is only one VMKernel interface with the vMotion check enabled in each of the hosts.

0 Kudos
npadmani
Virtuoso
Virtuoso

- Between two ESXi hosts where vMotion does not work:

Migrate: vm 727726: 3286: Setting VMOTION info: Dest ts = AAAAAAAAAAAA, src ip = <x.x.x.x> dest ip = <y.y.y.y> Dest wid = 0 using SHARED swap

SRC and DST IP addresses belong to the different LANs: SRC is the Management network and DST the vMotion one.

As Richardson pointed earlier, Please check in that source host, it has got vmkernel interface marked for management and vMotion both. Or else double check the vmk ip assignment itself to see if there's no mistake in those details.

Update:

Please also refer following official KB for some more inputs.

VMware KB: Performing vMotion fails at 14% despite vmkping succeeding from source to target IP addre...

Narendra Padmani VCIX6-DCV | VCIX7-CMA | VCI | TOGAF 9 Certified
0 Kudos