Random VMs at what seems to be random times stop being able to communicate with each other.
Example
VM1 on host 1
VM2 on host 2
VM1 cannot ping VM2
VM1 sees an ARP entry in it's ARP cache for VM2
VM2 does NOT see an APR entry in cache
VM1 is able to ping other VMs on the same network
VM2 is able to ping other VMs on the same network
VM1 is able to ping VM2 if they are on the same host
Cleared ARP cache on VM1 and it is removed successfully. Pinged VM again, but no response, however ARP is back in the cache as expected
To confirm, it is not just ping, it is a range of ports that is not working which should be.
No firewalls in the mix either
Also worth noting that this happens when the VMs are on different blades within the same chassis or on different chassis
Versions
ESXi & vCenter is 6.5 Update 1
Compute hardware is Dell FX2 Chassis with FC430 blades and dual FN2210 IO Modules
This happened after migrating VMs from older 5.5 Update 3 ESXi rack mount servers
HW versions and VMware tools have not yet been upgraded (not tested)
Any advise is appreciated
any IP duplication?
None
I had same problem in Hp DL360 G9.i have upgraded Firmware and drives than after problem has been resolved . I would recommend first update firmware and vmtool.
if you will have same problem please let me know
The blades and chassis are running the latest version. I have not checked the uplink switches but since this is happening within the chassis too I would not expect that they are part of the problem?
I've seen similar issues where port security was enabled on the physical switch ports, which limited the number of allowed MAC addresses per port.
André
In that case the VM should not ping any of the VMs , here the issue is only among two VMs while other VMs are pingable. correct me if I am wrong.
The problem is from one VM to another VM.
VM1 and VM2 can ping other VMs on the network
If I vMotion VM2 to the same host that VM1 is using then the problem goes away, but if I vMotion VM2 to the host it was on before, the problem comes back
It happens on some VMs randomly, even VMs on different networks so I cannot see anything in common
I'll take a look at that one and get back to you. Cheers
That feature is disabled. Any other ideas?!
Try packetcapture to see where it exactly gets dropped. That may give some clue.
Go into the edit settings and look at the MAC addresses. My guess is that the MAC address is the same for all the VMs. You can shut down the machines and change the mac addresses to a static MAC address. You could also delete the nics one each, then add new nics. Remember to reconfigure the IP addresses on the box and delete the ghost nics after doing that.
MACs are all different
Hi,
I've actually had this same exact issue running Dell FX2 Chassis/blades. My config was slightly different. I had FC830s and pass-through modules instead of the FN switches. Anyway, the issue was none of that but actually in the NIC inside the blade. Can you confirm what NICs your blades are using? For us, we were using Intel X710 NICs and we experienced months worth of support calls and random issues... One of those issues happens to be what you're describing here. I spent several weeks troubleshooting this and it wasn't something I could easily reproduce.
Here's how I understood it....When you migrate a VM1 to another host, it updates the network that it moved (gARP i believe?). Servers know that VM1 has moved, including the VMs on the source host that it came from. The ISSUE is, when VM2 on the source host tries to communicate with VM1 (recently migrated) the HOST NICs are trying to do some magic behind the scenes and that's where the problem is. VMware support described that the traffic actually gets black holed because the source host still thinks VM1 resides there. This can be very confusing and takes a lot of effort to actually prove with packet captures etc. Let me know if you're having trouble following what I wrote here.
Unfortunately, the only viable solution for my client was to swap these out for different brand NICs.
Hope this helps.. Private message me if you need more info.
-Brian
I recommend to try the latest i40en 1.5.6 driver and 6.01 firmware.
Here are the links to the downloads.