VMware Communities
felixrr
Contributor
Contributor

Network stack instability diagnosis

Hi everyone,

Please can someone assist me with knowing how to diagnose a network stack problem.  I have done all the stuff I can think of already, but there is still an underlying fault and I just don't know where to look to diagnose it any further.  This is the story so far:

Its a VMWare Workstation install, version 11.1.2 build-2780323

The host is CentOS 7 (with XFCE, rather than Gnome3 for its window manager - though I don't think this will make any odds)

Kernel of the host is 3.10.0-229.4.2.el7.x86_64

Basic networking is up - everything has IPv4 addresses, everything can ping everything else.

Most of the guests are also CentOS7, though I also have a Kali guest and a Win7 guest.  All are affected, though it is fair to say the Win7 guest seems the least affected.

The CentOS guests are all running 3.10.0-229.4.2.el7.x86_64 kernel.

The guests generally have two interfaces, a NAT interface and a Host-only interface.

The vmx files for the guest state that the NIC emulation is for "e1000" for all cards.  I have also tried VXNET3 here and it seemed to make little or no difference.

VMWare tools are installed on all guests after removing the open-vm-tools that comes with CentOS

The VMware tools version is 9.9.3.47419 (build-2759765)

The symptoms are very slow and unstable network connectivity.  This isn't reported as such within the guest, i.e. the OS doesn't think the nics are being disconnected.  It is instead just very slow and occasionally just hangs.

Looking at the logs on the host I can see lots of the following stuff that could be pointers in dmesg:

UDP: bad checksum. From 100.64.1.132:137 to 100.64.1.255:137 ulen 58

Where 100.64.1.x/24 is my NAT network (yes the subnet is a bit unusual, but it is a legit RFC1918 subnet for private addresses, just like 192.168.x.x)

I also have wireshark sitting on the host and it reports lots of TCP retransmissions and duplicate ACKs.

If someone happens to have seen this before, it would be awesome if you share the fix, but really I am after any good ideas by which I can perform some diagnosis as I have all but exhausted the stuff I can think of.

Also worth noting - I am currently trialing VMWare Workstation so have no official support but if I can get this working it will be a purchase!

Thank you in advance!

0 Kudos
5 Replies
felixrr
Contributor
Contributor

Minor update.  Based on this person:

https://communities.vmware.com/thread/514073

Reporting something similar sounding I decided to downgrade to Workstation 10.  Unfortunately this does not seem to have had an impact.

0 Kudos
felixrr
Contributor
Contributor

Overnight I realised that my test of Workstation 10 had been completed without trying to older version of VMWare Tools - just a minor update really, as I have now done this and it produced no difference in networking speed.

If anyone has any good ideas I would be massively appreciative!

0 Kudos
felixrr
Contributor
Contributor

I've also tried disabling LRO both on the guests and the host.  Made little, if any, difference.

I've also tried simplifying the network - now I just have a single NAT adapter and no HOST adapter.  No difference.

I've also tried e1000 and vmxnet3 network drivers under VMWare Workstation 10 to see if that made a difference, but sadly not...

0 Kudos
jmhayes
Enthusiast
Enthusiast

I have the same setup and problem.  I moved this VM from a working Workstation 9 setup to a new Workstation 11 install.  I tried changing e1000 to vmxnet3 and saw no difference.  My symptom is seen easiest by slogin to the guest and typing something that spits out a lot of data, like dmesg: sometimes it hangs halfway through, sometimes it doesn't start at all.  Logging in through the VNC console shows the guest to be unremarkable: no errors, no performance problems, etc.  Just the network.

0 Kudos
johnk_dev_null
Contributor
Contributor

Somebody with the same symptoms tried the following on the guests to good effect.  It turns of all 'hardware' chksum offloading.

# ethtool --offload eth0 rx off tx off

# ethtool -K eth0 gso off

The person that had this issue had an unsupported NIC type.

However in other search results these symptoms can actually be caused by a DDOS attack.  Apparently in that case you can block it with a firewall -

"blocking from the source port 19 and type UDP to the destination port 1024-65535 and type UDP."

Of course it may be neither.

0 Kudos