I have noticed an interesting phenomenon after doing some packet captures that likely explain what's going on.
We have two ports defined on the "target" ESXi 4.1 server (the server where the VM is being vMotioned to):
- vMotion: 10.49.2.49 (00:50:56:74:30:28) [vMotion Enabled]
- Management Network: 10.49.2.33 (00:22:19:94:88:f8) [Management Traffic Enabled]
Both of these "ports" are on the same vmnic on the same subnet.
My packet capture indicates the following vMotion packets from the source:
20:07:33.351805 a4:ba:db:2d:3e:9a > 00:50:56:74:30:28, ethertype IPv4 (0x0800), length 1514: IP 10.49.5.155.59601 > 10.49.2.49.8000: . 45982380:45983828(1448) ack 1 win 4163 <nop,nop,timestamp 416800893 3266226> 20:07:33.351809 a4:ba:db:2d:3e:9a > 00:50:56:74:30:28, ethertype IPv4 (0x0800), length 1514: IP 10.49.5.155.59601 > 10.49.2.49.8000: . 45983828:45985276(1448) ack 1 win 4163 <nop,nop,timestamp 416800893 3266226>
This is the destination MAC I would expect to see (00:50:56:74:30:28). However, when observing the TCP ACK's:
20:13:33.202985 00:22:19:94:88:f8 > a4:ba:db:2d:3e:9a, ethertype IPv4 (0x0800), length 66: 10.49.2.49.irdmi > 10.49.5.155.62878: . ack 503189697 win 34390 <nop,nop,timestamp 3302215 416836876> 20:13:33.202986 00:22:19:94:88:f8 > a4:ba:db:2d:3e:9a, ethertype IPv4 (0x0800), length 66: 10.49.2.49.irdmi > 10.49.5.155.62878: . ack 503192593 win 34209 <nop,nop,timestamp 3302215 416836876>
You can clearly see that the the source MAC is now 00:22:19:94:88:f8 -- the MAC address bound to the Management Network (10.49.2.33).
So why in the world don't the ACK's have the correct MAC address? This is undoubtedly why the CAM table drops the correct MAC, and the true source of our unicast flooding...
We are currently experiencing the same problem with vmotion and ESXi 4.1 U1, this problem did not exist on ESX 4.0 Update 2. (We are currently migrating from esx to esxi)
I was kind of shocked to find that the vmotion port was the cause of the unicast flooding we experienced.
Did you find a "solution" to this, other than having a dedicated nic for management and a dedicated nic for vmotion seperated?
Seems like more people are experiencing this with esxi 4.1: http://serverfault.com/questions/197918/clearing-arp-cache-on-esxi-4-1
A workaround that i found is setting the "switchport block unicast" on our cisco switches, but i can't consider this a good solution.
Please let us know the results of the case. I have our migration from ESX 3.5 to ESXi 4.1 u1 coming up soon and did not plan to use a different subnetwork for vmotion.
VMware support helped resolve the issue with zero downtime and our config was brought inline with best practice recommendations -
What I thought would be a large network topolgy change involving downtime turned out to be a live virtual networking config change:
I recommended they create a KB for the solution as well
as someone mentioned before, to solve that problem and avoid that type of difficulties in future you have to re-design your IP space - you have to separate VMkernel IP form Management IP range.
After that you can check connectivity between vmotion VMkernel ports by command
# vmkping x.x.x.x - where x.x.x.x is VMkernel IP address of other ESX's
To clarify: it is not problem with ESX 4.0 or 4.1. Issue can occur after you igrate from ESX Classic to ESXi. I classic management interface is own by the COS Kernel while other vmknic ports (for example vMotion, iSCSI,) are own by the VMKernel. In that case management traffic (vswif on ESX Classic) can be in same IP subnet as any of the vmkernel NICs, since they would belong to different kernels.
Once migrated to ESXi the management network would become another vmknic, and therefore colliding with an existing vmknic's IP subnet.
someone knows if this affects ESXi 4.1 Update 2?
According to this
Right now my Management and vMotion are like this (2 hosts)
Management - 192.168.23.240
vMotion - 192.168.23.241
Management - 192.168.23.242
vMotion - 192.168.23.243
so I should create a vMotion with 10.10.10.x ???? for instance?
vMotion host 1 : 10.10.10.5
vMotion host 2 : 10.10.10.6
and same Gateway 192.168.23.1
something like this might be enough?
vMotion host 1 : 192.168.30.5
vMotion host 2 : 192.168.30.6
Let me know guys
thanks a lot
If you are using 24 bit subment mask (255.255.255.0), you do not neet to change full IP class - just use simply 192.168.22.241.
so 192.168.22.x is enough.... I will try that and let us know
what I see in the blog link is the guy is using 10.10.10.x and the initial was 17.x.x.x
it's only needed another subnet in the same Class either A B or C