VMware Cloud Community
ats0401
Enthusiast
Enthusiast

Possible bug with multi-nic vMotion in ESXi 5

My current setup is as follows

  • 2 servers, two 10GB nics setup for vMotion, latest version of ESXi 5.0

Server 1

VMK1 - 172.16.0.100 /16 - vMotion-1 pinned to vmnic1

VMK2 - 172.16.1.100 /16 - vMotion-2 pinned to vmnic2

Server 2

VMK1 - 172.16.0.101 /16 - vMotion-1 pinned to vmnic1

VMK2 - 172.16.1.101 /16 - vMotion-2 pinned to vmnic2

This is on a standard vSwitch. Management vmk0 is on this switch as well, set with both nics as active, routed based on orginating port ID.

I have a similiar setup on a server with 4 10GB nics.It has a vDs with 4 VMK's, all using a /16 subnet and the multi-nic vMotion works great. It sends traffic out all four 10GB nics. The addressing scheme for the VMK's is 172.16.1.X, 172.16.2.X, 172.16.3.X, 172.16.4.X with a /16 subnet.

The setup on the standard vSwitch will NOT work. The vMotion times out at 9% and fails. vmkping only works from vmk1 to another vmk1 on any server. vmk2 will not ping from any server to any server. The ARP table is empty of any entries for vmk2; it will not get an arp response. I did a sniff on the network traffic and it's not even sending an arp request to get the mac of the vmk2 ip.

When I look at the arp table of the host running vDs, it show's all four arp entries of each vmk of the host it has recently vMotion'd too, and everything works fine.

I was able to get it to work by changing the subnet mask from /16 to /24. This effectivley is putting two seperate subnet's on each host, which is the opposite of what the VMware documentation says.

This KB article seems to explain what is going on, and it also is going against what we are recommended to do with Multi-NIC vMotion.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=201087...

In particular:

If both of these vmknics are configured to be on the same IP subnet, the vmkernel TCP/IP stack chooses one of the two interfaces for all VMkernel traffic (vMotion and NFS) going out on that subnet. Configurations having more than one vmknic interface on the same IP subnet should be avoided

So it seems to me there is a problem with their recommended implementation on standard switches. Has anyone else come across this issue?

0 Kudos
3 Replies
ats0401
Enthusiast
Enthusiast

It looks like there is indeed a bug, this post is showing similiar behavior:

http://vmtoday.com/2012/02/vsphere-5-networking-bug-2-affects-management-network-connectivity/

0 Kudos
chriswahl
Virtuoso
Virtuoso

Ensure that you are using an active/standby and standby/active configuration for the physical uplinks. Also, I would recommend not using that VLAN / subnet for any other communications - dedicate a VLAN / subnet strictly for vMotion.

I run the configuration mentioned above in my 1Gb environment without any issue.

VCDX #104 (DCV, NV) ஃ WahlNetwork.com ஃ @ChrisWahl ஃ Author, Networking for VMware Administrators
0 Kudos
HeathReynolds
Enthusiast
Enthusiast

Yeah, all of the VMotion adapters need to be on the subnet.

I've put in a feature request to allow us to group them or put them in seperate subnets without any feedback.

When you start a vMotion vCenter starts down the list of vmotion enabled adapters on each host and pairs them off in the order that they were presented to vcenter, without any regard for the hosts routing table.

If they were presented in the right order to vcetner it will work with two vmotion subnets for a while, but at some point vcenter could start pariing them up differently and you get 9% failure.

You can tail the vmkernel log on the hosts and see which adapters are being paired.

All you can do is put them on the same subnet, and use active/standby or MAC pinning to associate the vmk interfaces with physical NIC.

My sometimes relevant blog on data center networking and virtualization : http://www.heathreynolds.com
0 Kudos