TCP segments truncated by 4 bytes in Inter-vm comm...

zensan · ‎06-12-2014

I have 2 Linux virtual machines running on 2 different servers which run ESXi 5.1 (build-838463). There is a client-server application written using zeromq over TCP running between the two VMs From time to time, the application hangs and the TCP send-queue (seen using "ss" tool) on one of the VMs shows packets have been stuck in retransmit.

The physical NIC on both servers is Intel 82574L

Salient points:

1) This problem occurs more quickly on Ubuntu 14.04 (Kernel 3.13) but also occurs less frequently on Ubuntu 12.04 (Kernel 3.11 or 3.8).

2) It occurs irrespective of whether the VM NIC is e1000 or vmxnet3 (with vmware tools installed).

3) In one instance the 2 ESXi servers were connected by Linksys SRW2024 and other case it was Netgear N3000

4) It occurs only when MTU=9000 (also failed on MTU=8000) but never when MTU=1500.

The wireshark output clearly shows that the client sent a large packet greater than MTU (say 17071 bytes). On the server two packets are received (8948 and 8119 bytes) but the 2nd packet had 4 bytes less. The server sends back a SACK after which the client keeps resending the offending packet but the server always receives 4 bytes less. As a result, the client keeps retransmitting but never manages to get the packet through. Consequently, the TCP connection hangs. This failure has happened many times with different tcp packet lengths (its always 4 bytes missing though - no more, no less). There are larger packets which get through so it doesn't happen always.

I am attaching the CSV wireshark output for the client and server. You can see 8123 byte packet sent by client was received as a 8119 byte packet by the server.

I suspected it was related to TSO (tcp segmentation offload) and tried to disable as instructed in KB 2055140 but I can't seem to disable it on the physical NIC (running "ethtool" on the ESX command line gave some error "function not implemented"). After disabling TSO in the VM and on the ESX host, I still hit the problem. Is it necessary to disable TSO on the NIC as well in order for TSO to be disabled completely ?

Anyone here have any idea how to further isolate this bug ? We are in the process of upgrading to esx5.5 but any other suggestions.

grace27 · ‎06-13-2014

Hi

Welcome to communities.

What about your switch port speed and network diagram.

zensan · ‎06-16-2014

Hi Grace,

By default all ports on the Switch are set to 1Gbps. Network configuration is not complicated. Both ESXi hosts connected to same switch directly to adjacent ports.

Any idea what this error could be due to ?

TylerMay · ‎10-07-2014

Hi zensan,

Wow - we are having the exact same issue. Did you ever find a resolution to your problem?

Regards,

Tyler

zensan · ‎10-07-2014

No resolution. The only workaround is to set MTU to 1500.

Could you post your configuration so can identify the commonalities. Have you tried changing the NIC ?

bbernsee · ‎10-07-2014

You might want to look at setting a Max TCP segment size in the guest OS - TCP_MAXSEG to fit w/ in the Jumbo frames you are able to send. This way you aren't dealing with TSO spliting up the segments into multiple packets. That way you still get the efficiency of jumbo frames and you should be able to match up datagram to datagram.

The four byte offset is an irregularity - it seems odd to me that the it is the second packet in the segment that is affected and it just so happens that the FCS is four bytes...

TylerMay · ‎10-08-2014

Hi Zensan,

I'll try to summarize our configuration.

Machines that experience the 4-byte issue have the following:

ESXi 5.1.0 (build 1065491)

Red Hat Enterprise Linux 5u10 guest VM's - kernel 2.6.18-348.3.1.el5

vmxnet3 drivers (also tried e1000 without success) - version: 1.1.34.0-NAPI

GlassFish is the application server - v2.1.1 Patch17

Jumbo frames enabled on both source & destination - set to 8900

Virtual machine version 8

Our issue only seems to happen with the 2nd packet being the original size of 8189, 8190, 8191, or 8192 bytes - and of course the destination sees 4 bytes less.

We have another ESX cluster, which does *not* experience the issue - and here are the settings for that:

ESXi 5.1.0 (build 1065491)

Red Hat Enterprise Linux 5u10 guest VM's - kernel 2.6.18-371.6.1.el5

vmxnet3 drivers - version: 1.1.34.0-NAPI

GlassFish application server - v2.1.1 Patch17

Jumbo frames enabled on both source & destination - set to 8900

Virtual machine version 7

Our TSO settings on all the guests are the following:

rx-checksumming: on

tx-checksumming: on

scatter-gather: on

tcp segmentation offload: on

udp fragmentation offload: off

generic segmentation offload: off

generic-receive-offload: off

Hopefully we can find something in common.

zensan · ‎10-09-2014

So the bug doesn't occur with "virtual machine version 7" ?

You didn't mention which physical NIC you have on the server.

And you might want to try bbernsee's suggestion. Unfortunately, I don't have our setup handy to try that idea out.

TylerMay · ‎10-09-2014

It's hard to say if the issue appears after upgrading from 7 to 8 - I can't upgrade the guests in question due to their being production machines. I may be able to schedule something in the future.

As far as the NIC - the guests in question are physically located on the same ESX server and on the same subnet - so I do not believe that the traffic is ever being put onto any physical NIC - it should always remain within the vswitch? Regardless the NIC's are as follows. On the 'broken' cluster:

Intel 82599EB 10GbE

On the working cluster:

Mellanox MT26418 - ConnectX VPI - 10GbE

I'll see if I can try to modify the TCP_MAXSEG settings.

zensan · ‎10-09-2014

In my case, the guest VMs were running on 2 different ESX servers and hence I presumed that the NIC could be at fault. If you can reproduce the bug within a single ESX server, then the bug is most likely inside the VMWare hypervisor.

Let me know what you see after that TCP_MAXSEG setting.

PaulCB · ‎10-10-2014

I think I'm seeing something related/similar in 5.5

Two VM's on the same host communicating with each other. IP's are in the same subnet. One VM hosts SOAP Web services and the other is a client of these web services. SOAP is being passed over the wire in fast infoset (binary encoding). A few times a day we see failures on the client related to parsing the response from the server. Typically EOF exceptions and the like. After taking some wireshark traces on the client, I can see that the issue always occurs when a TCP PDU comes through that is greater than the MSS. The MSS negotiated in the SYN/SYN ACK is 1460 but I can see that the web service response is chunked with the first PDU being say 2962 bytes. If I go in wireshark and export the re-assembled payload to a file, and then try and decode the fast infoset back to XML, it fails due to the data being "corrupt". If I do the same for a response that is chunked in parts less than the MSS, then the decode works perfectly.

zensan · ‎10-10-2014

Do you know if you are seeing 4 byte truncation as well ?

Since you have wireshark output, you can select and export the relevant packets as CSV and attach them to your reply here.

That will help verify if its the same problem.

Also please post the output of "ss -imote" which usually shows packets stuck with non-zero value in the "Send-Q" column.

PaulCB · ‎10-11-2014

Hi,

I cannot see any stuck packets so it may not be the exact same scenario. I am going to try and get a pcap of the client and server side when the issue occurs so I can compare. Right now I only have a pcap of the client side

TylerMay · ‎10-16-2014

We appear to have resolved this issue by ensuring that all guest VM's were using the vmxnet3 drivers. Originally the guest missing the 4 bytes was using an e1000 driver and the other guest was using vmxnet3 - after we changed the e1000 guest to vmxnet3 we no longer are experiencing this issue when using jumbo frames. I have not yet attempted to 'downgrade' the driver back to e1000 to see if the issue re-appears.

zensan · ‎10-16-2014

C'est magnifique ! I will try this out on our setup when I get time and post an update.

PieterjanMonten · ‎11-25-2014

I had the same behaviour: debian hosts using e1000 driver with a MTU 9000 vlan, dropping 4 bytes on every TCP packet who's size was in the vicinity of 4025:

Server side:

14:20:58.899828 IP 10.10.22.1.8000 > 10.10.22.21.34222: Flags [P.], seq 27011:31036, ack 184, win 8009, options [nop,nop,TS val 371053137 ecr 2338244542], length 4025

Client side:

14:20:59.196651 IP truncated-ip - 4 bytes missing! 10.10.22.1.8000 > 10.10.22.21.34222: Flags [P.], seq 27011:31036, ack 184, win 8009, options [nop,nop,TS val 371053243 ecr 2338244542], length 4025

I didn't want to switch to vmxnet3 drivers or disable jumbo frames right away, so I went with setting the MTU to 3000 (For the sake of keeping jumbo frames. I'd recommend simply going back to standard MTU 1500, there's not that much to win from it anyway).

As fragmented packets now stay below the 4K limit, everything is smooth again, no more truncated packets !

bspagna89 · ‎11-25-2014

Has anyone tried VMXNET2 ?

New blog - https://virtualizeme.org/

hpi2 · ‎12-12-2014

All

TCP segments truncated by 4 bytes in Inter-vm communication on esxi 5.1