VMware Cloud Community
jhanekom
Virtuoso
Virtuoso
Jump to solution

Slow UDP performance on Windows guest when using vmxnet vs. e1000

More a posting of some tests I've done than a question, but with the hope it can help someone else. Any comments or explanations would be most welcome.

Please note that the performance tests in this post deals exclusively with the adapters inside the virtual machines. In cases where tests were done on VM's on different ESX servers, they were done over properly-configured, dual, Gigabit 802.3ad/Etherchannel links with vSwitches properly set up for ip-hash load balancing. There were no speed or duplex mismatch issues, so please don't ask me to go check that. Physical hardware is HP BL25p servers with Broadcom NICs using the tg3 driver. VMware ESX version used was 3.0.1.

A client of mine called me to help troubleshoot performance issues on a Symantec Ghostcast server they had virtualised. The client is using the Unicast option, which uses directed UDP datagrams. I was not in a position to test multicast.

When building PC's from the original physical server (W2K0, PIII, 100Mbps), they got 120MB+/min throughput. From the new virtual machine (W2K3 on ESX3 on dual-Opterons with Gigabit NICs), they were getting around 45MB/min throughput. CPU utilisation (this is a single-CPU VM) was low, and there didn't appear to be any contention anywhere. Other file operations were reasonably fast; only Ghost was slow.

From the little information available in the Ghost log files, it was clear that the software was running into what it calls "congestion" on the wire. It was not clear why, but it was quite clear that this problem was going to involve UDP transfers.

I started experimenting, changing the network adapter from vmxnet to e1000 on a hunch by manually editing the VM's .vmx file. Throughput immediately doubled to around 96MB/min.

To get some more scientific figures, I ran Netperf (http://www.netperf.org/netperf/NetperfPage.html) on three Windows VMs using the default TCP_STREAM and UDP_STREAM profiles. I ran the tests multiple times and recorded the results (see lower down in the post.) My findings were as follows:

\* TCP performance suffered to some extent if going from one port group to another on the same vSwitch or when traversing a pSwitch. The former wasn't a complete surprise; the latter was expected.

\* The type of network adapter did not affect TCP performance much, but the e1000 was slightly slower than the vmxnet

\* UDP performance that traversed port groups or pswitches suffered only when sending from a vmxnet adapter, and severely so (performance dropped by 97%)

\* TCP performance between two different servers was in some cases higher than TCP performance to the same ESX server, presumably due to contention somewhere

\* The E1000 driver appears to have slightly higher overheads than the vmxnet driver (it's slightly slower), but the vmxnet driver is crippled when it comes to UDP performance

Conclusion: There appears to be some severe issue with the interface between the physical adapter and the vSwitch for UDP packets generated by the vmxnet adapter.

I've worked around the problem by using the e1000 adapter, but an explanation about why vmxnet performs so poorly would be welcome.[/b]

Detailed performance results:

Same ESX server, same port group:

UDP vmxnet-vmxnet: 600Mbps

UDP vmxnet- e1000: 600Mbps

UDP e1000 -vmxnet: 485Mbps

UDP e1000 - e1000: 475Mbps

TCP vmxnet-vmxnet: 750Mbps

TCP vmxnet- e1000: 570Mbps

TCP e1000 -vmxnet: 645Mbps

TCP e1000 - e1000: 530Mbps

Same ESX server, different port groups/VLANs on same vSwitch:

UDP vmxnet-vmxnet: 20Mbps

UDP vmxnet- e1000: 20Mbps

UDP e1000 -vmxnet: 520Mbps

UDP e1000 - e1000: 515Mbps

TCP vmxnet-vmxnet: 250Mbps

TCP vmxnet- e1000: 300Mbps

TCP e1000 -vmxnet: 300Mbps

TCP e1000 - e1000: 260Mbps

Different ESX servers, same VLAN:

UDP vmxnet-vmxnet: 20Mbps

UDP vmxnet- e1000: 20Mbps

UDP e1000 -vmxnet: 535Mbps

UDP e1000 - e1000: 540Mbps

TCP vmxnet-vmxnet: 425Mbps

TCP vmxnet- e1000: 315Mbps

TCP e1000 -vmxnet: 360Mbps

TCP e1000 - e1000: 260Mbps

0 Kudos
1 Solution

Accepted Solutions
leonnap
Contributor
Contributor
Jump to solution

This is the solution!!!

great work!

you solution worked also on our side!

regards,

Leon Nap

View solution in original post

0 Kudos
28 Replies
jhanekom
Virtuoso
Virtuoso
Jump to solution

I've done some more tests and have figured out something interesting. The vmxnet driver is only affected for UDP packets above a specific size.

If the data portion of the UDP packet is 282 bytes (total packet size: 324 bytes), performance is great. As soon as the data portion is 283 bytes or larger (packet size 324 bytes or larger), performance drops almost tenfold to around 20Mbps.

Steps to reproduce this:

\* Run "iperf" as a server on any machine (virtual or physical) with the following command line: "iperf -s -u -i 1 -l 282"

\* Run "iperf" as a client on a virtual machine with the vmxnet driver with the following command line: "iperf -c -u -b 1g -l 282"

\* Repeat the steps above with a packet length of 283

\* Rinse & repeat after changing the client VM's network adapter type to e1000

As a follow-on to the original issue of Symantec Ghost being slow, I had an opportunity today to test with a real, physical laptop that previous benchmarks were run on. The results are astounding...

\* Original PIII server with dual 100Mbps NICs: 160MB/min

\* VMware ESX VM with vmxnet driver: 45MB/min

\* VMware ESX VM with e1000 driver: 417MB/min!

I hope to be able to sit down tomorrow and open a SR for this.

0 Kudos
jhanekom
Virtuoso
Virtuoso
Jump to solution

Just received a reponse back from VMware via HP. Apparently there is a known issue in the vmxnet virtual adapter that causes slow UDP transmit throughput due to a problem with offloading some operations to hardware when packets reach a certain size.

The workaround at present is to set the following parameter (in the VMX file) for each individual virtual machine that is being affected:

ethernet0.features=0[/code]

Expect higher CPU utilisation, but network throughput an order of magnitude faster!

The problem will likely be resolved in a future release.

0 Kudos
leonnap
Contributor
Contributor
Jump to solution

This is the solution!!!

great work!

you solution worked also on our side!

regards,

Leon Nap

0 Kudos
jhanekom
Virtuoso
Virtuoso
Jump to solution

Glad it helped Smiley Happy

Remember to try again without this setting after the next few patches of ESX to see if the problem has been fixed.

I point this out because, at least in theory, changing this parameter will result in higher CPU utilisation.

0 Kudos
LasseHT
Enthusiast
Enthusiast
Jump to solution

I have this problem, but how do I change to the e1000 driver??

0 Kudos
wtreutz
Enthusiast
Enthusiast
Jump to solution

you must manually edit the .vmx-config file

ethernet0.virtualDev = "e1000"

It is, for my knowledge, not official support on 32 Bit Guest-OS. When you create i.e. a VM Win2003 x64 Bit you will see the VM is build with the e1000 LAN-Card.

0 Kudos
jhanekom
Virtuoso
Virtuoso
Jump to solution

With the workaround supplied by VMware, you don't need to. Simply add the following to the virtual machine's VMX file while it still runs with the vmxnet driver: (be sure to shut down the virtual machine before doing so)

ethernet0.features=0[/code]

0 Kudos
LasseHT
Enthusiast
Enthusiast
Jump to solution

many thanks Smiley Happy

after the edit and a reboot of the VM W2K3 found new hardware and after configuring the IP etc. it's now done a full Ghost where before it would stop with a silly error message talking about "image corrupted".

I put both lines in the vmx file. Now it works!, so if it aint broke don't fix it...

0 Kudos
jftwp
Enthusiast
Enthusiast
Jump to solution

Oh oh -


after a couple of weeks of happy Ghosting, and with the ethernet0.features=0 addition that had fixed the slowwww Ghostcast UDP performance, the issue is back.

So, in classic T-shooting stance, we ask the question: WHAT CHANGED?

Answer: I applied the latest critical updates for ESX 3.0.1 hosts, namely 2066306, 1006511, 1410076, & 2158032.

I'm suspecting that 1006511, since it updates the e1000 driver specifically, might be to blame here, since nothing else has changed----literally.

The host has dual Broadcom ports (don't use the e1000 driver), so, before re-reading this group, I went ahead and created a vswitch/port group based on 1 of the 2 broadcoms. This NIC is now dedicated to the VM running Ghost. Waiting to hear back from our desktop folks since the reconfig/restart... will advise.

If THAT works, then will change back to vswitch it WAS using fine before, and before I applied the critical patches last night. I will then REMOVE the ethernet0.features=0 line from the vmx file, reboot, and have them try again.

Hopefully, the new driver resolves the e1000/vmxnet UDP issue, BUT REQUIRES that that line no longer be in the vmx file... Will advise.

0 Kudos
epping
Expert
Expert
Jump to solution

please keep us updated, interesting stuff

0 Kudos
adorsett
Enthusiast
Enthusiast
Jump to solution

I just want to respond and say that I'm having the exact same issue with ESX 3.0.1 and my ghost server. I haven't patched my machine so the ethernet0.features=0 worked on my vm.

Now since so many of us are having the problem, will they patch this now!?

0 Kudos
jhanekom
Virtuoso
Virtuoso
Jump to solution

I think this is a pretty minor issue, but would have appreciated a KB article acknowledging the bug at the time I did my initial troubleshooting.

I don't know whether any problems reported in this forum are included in in-house statistics which informs decisions on which problems to address first.

In other words: if you want to contribute to getting VMware to address this, open a service request. Posting here may help that cause, but possibly won't. They're not called "community" forums for nothing!

0 Kudos
angsana
Enthusiast
Enthusiast
Jump to solution

James,

We don't expect any of the Dec 2006 patches to change virtual NIC (either vmxnet or e1000) behavior. Can you please let us know whether you are still having the slow UDP problem?

Thanks.

\- Boon

0 Kudos
Rumple
Virtuoso
Virtuoso
Jump to solution

I've been trying to use this virtualdev = "e1000" to test with an Altiris speed issue we are having and everytime I power on the XP VM it switches this back to vlance.

ethernet0.virtualDev = "vlance"

0 Kudos
jhanekom
Virtuoso
Virtuoso
Jump to solution

I found it had a tendency to do that to, but the problem sorted itself out after several tries. Maybe try de-registering the VM, editing the file and then re-registering it?

Note also that the "ethernet0.features=0" option is a much better way of addressing the problem.

I have also been told by my HP rep last week that the UDP performance problem will be resolved in ESX 3.0.2, but have not been given a timeline for its release.

0 Kudos
grasshopper
Virtuoso
Virtuoso
Jump to solution

excellent post.

0 Kudos
LasseHT
Enthusiast
Enthusiast
Jump to solution

I now have a second server (copy of the first one I tried on a DR SAN) that I'm trying to get to work, but it's not happy about the e1000 entry and keep defaulting back (I'm on 3+n retry's). maybe it's someting about creating a new server pointing to excisting discs rather than amending a vmx file in the same folder, but im clutching at straws here.

0 Kudos
oreeh
Immortal
Immortal
Jump to solution

When modifying a VMX file manually unregister the VM first the reregister it.

0 Kudos
LasseHT
Enthusiast
Enthusiast
Jump to solution

as in remove from inventory?

how would I add it in using VC2? seames to only want to create a new VM and allow me to point to old disks file?

0 Kudos