Hi all -
I have a couple of tickets open with VMware and our SAN vendor, EqualLogic, on this issue. Since configuring our production and DMZ clusters we have been noticing that virtual machines will sometimes drop network connectivity after a successful vMotion or Storage vMotion. Occasionally, though far less frequently, virtual machines will also spontaneously lose network over night. This has only happened a few times. The strange thing is that other guests on the VM host are fine - they do not lose network at all. In fact, I can fail over 3 virtual machines from one host to another, and 2 of the 3 may fail over correctly, while one will lose network. The workaround? Simply "disconnect" the virtual NIC and "reconnect" it, and the VM will start returning packets. I can also fail the troubled VM back over to the prior host and it will regain network. I can reboot it and it will re-gain network. I can re-install the virtual adapter completely, and it will re-gain network.
VMware saw a bunch of SAN errors in our log files so we updated our SAN firmware to the latest version. That seems to have fixed that but we still have the issue. Here are some of the specs - all environments are virtually identical except for memory:
Broadcom 5709 NICs
EqualLogic SAN running 5.0.5 F/W
We are using jumbo frames. ESXi is fully-patched. I have not seen a pattern regarding whether or not it is only certain guest OS that lose network but we are primarily a Windows environment.
When a virtual machine loses network, we cannot:
- ping to it
- ping from it
- ping from it to virtual machines on the same host or vSwitch
- ping outside our network
- resolve DNS, etc.
I have followed certain VMware KBs to no success, including:
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1002811 (Port Security is not enabled)
-All VMware tools have been updated to the latest correct version and match the ESXi host
-Logged onto the ESXi service console, I cannot ping the trouble VM by host name or by IP address, but I can ping OTHER virtual machines not experiencing the issue. I also can ping external from the service console.
-Logged into the troubled VM itself, I cannot ping other VMs, I cannot resolve host names, I cannot ping by IP. The VM CAN ping itself by IP but not by hostname. I cannot ping other VMs on the same virtual switch or network by either IP or host name. I cannot ping the management network vSwitch.
-All vSwitches are configured identically and named the same.
-Notify switches is set to yes
-There are plenty of available virtual ports
-We have tried both E1000 and VMXNET virtual adapters with no difference.
-All adapters are configured to negotiate, but we have tried forcing particular speeds as well with no difference
I do appreciate your help. I am having trouble getting anywhere on this issue with the vendors.