VMware Cloud Community
ctfb
Enthusiast
Enthusiast

Really odd VM network issue where only some destinations are unreachable from the Guest OS

I've have been experiencing an issue that's hits seemingly randomly and provides little diagnostic data to troubleshoot. What is happening is that the VM is communicating but can't reach specific destinations, via IP or DNS.  You may be able to ping every IP in existence except one and that one is something needed for a production app to work.  Sometimes it can be more that one destination that's unreachable but still others work.

For example, we had a VM running 2012 R2 that is used for up/down monitoring other other systems.  Suddenly it reports that multiple systems are down and when RDP'd to the VM you can't ping the systems reported down but the unreachable servers are up and running fine.  Networking on the monitoring VM is also working in general because I can RDP'd to it and I can ping other IPs just fine.

The "fix" has been to vMotion the VM to another host in the cluster and suddenly all the unreachable IPs are immediately reachable. If the VM is moved back to the originating host, it continues to work.  It's as if there was some glitch on the port group port and refreshing that fixes it for an undetermined amount of time.

Running Wireshark shows packets leaving the guest OS but no response for the trouble destinations and when we do have a FW in-between the source and destination we never see the packets reach the firewall.  The traffic appears to never leave the ESXi host. Other VMs on the host are unaffected as are VMs in the same port group.

The only thing I think narrows this down is that the VMs are typically older OS's such as Windows 2012/R2 or 2008 R2 or RHEL 6.x.

Environment details
vCenter 7 Update 1 (also happened on vCenter 6.7 Update 3)
ESXi 6.7 Update 2
VM's have current VMware Tools
NIC are E1000 or VMXNET3
dvSwitch at 6.5 level

I had a ticket opened with VMware before but we didn't find a cause or event a catch of it in a log bundle. Looking to see if anyone else has experienced this and found a resolution

We did have one recent recommendation from support to check the Hardware Compatibility version and since the VMS are older typically the hardware version is 10 or 11 but thinking this isn't a fix as I upped the hardware level on a VM and rebooted and the issue returned but then I vMotioned it and VM communication returned to normal.

Reply
0 Kudos
0 Replies