We have several Windows Server 2008 x64 guest systems that occassionally lose connectivity to specific physical Windows 2008 servers on the same IP subnet. When this happens, a specific VM will be unable to communicate with a specific physical machine. Both machines will still be able to communicate with other physical and virtual servers on the same subnet and other subnets, just not each other. When this happens, a ping results in "destination host unreachable". The same virtual and physical servers are not always involved each time it happens.
This behaviour was happening on ESXi 3.5, we were hoping an upgrade to vSphere 4.0 would fix it, but unofrtunately it has not. vSphere is at 4.0 u1, and VMware tools are up-to-date in the guests. I've checked for duplicate MACs and duplicate IPs, but can't find any.
When the error occurs, unchecking the NIC in VMware tools or in "edit settings", applying, waiting for Windows to lose sight of the network, then turning the NIC back on fixes the issue.
Clearing ARP caches, flushing DNS, etc, etc, does not fix the issue.
3 node vSphere 4 cluster, behaviour happens on all physical nodes and virtual guests, happens once every few days. Seems to happen most often after a vmotion, but can occur without it. Virtual Machine network is on a NIC team, no vlan tagging. Approximately 10 virtual machines, so no where near the 56 port limit on the vswitch.
We have another 2 node ESX 3.5 cluster with a mix of Windows 2003 and 2008 servers that is not exhibiting this problem.
Anyone seen this behaviour before or have any ideas?