VMware Cloud Community
oreeh
Immortal
Immortal

Severe network performance issues

This thread is a follow-up to the following threads since these seem to be related:

http://www.vmware.com/community/thread.jspa?threadID=74329

http://www.vmware.com/community/thread.jspa?threadID=75807

http://www.vmware.com/community/thread.jspa?threadID=77075

Description of the issues and "results" we have so far.

juchestyle and sbeaver saw a significant degradation of network throughput on 100 full virtual switches.

The transfer rate never stabilizes and there are significant peaks and valleys when a 650 meg iso file

gets transferred from a physical server to a vm.

Inspired from this I did some short testing with some strange results:

The transfer direction had a significant impact to the transfer speed.

Pushing files from VMs to physical servers was always faster (around 30%) than pulling files from servers.

The assumption that this is related to the behaviour of Windows servers was wrong, since this happened

regardless of the OS and protocol used.

Another interesting result from these tests: e1000 NICs always seem to be 10-20% faster than the vmxnet

and that there is a big difference in PKTTX/s with vmxnet and e1000.

After that acr discovered real bad transfer speeds in a Gigabit VM environment.

The max speed was 7-9 MB/s, even when using ESX internal vSwitches.

A copy from ESX to ESX reached 7-9 MB/s too.

The weird discovery in this scenario: when disabling CDROMs in the VMs the transfer speed goes up to 20 MB/s.

Any ideas regarding this?

I'll mark my question as answered and ask Daryll to lock the thread so we have everything in one thread.

Reply
0 Kudos
387 Replies
juchestyle
Commander
Commander

Just a quick response to get on this thread and update:

It seems that the physical switch is probably the culprit here. We opened an SR on this and were told that ESX does not do a good job fluctuating network transfer speeds like physical machines do. If you have a physical switch in the middle (who doesn't) you want to set it to 1000 full and not auto.

Apparently if you have a physical switch that is set to Auto, ESX will talk to it and decide to transfer files at the lowest common denominator to avoid having to scale up and down the transfer rate.

I am not particularly happy with this answer, but it seems to be the best answer we have gotten so far.

Respectfully,

Kaizen!
Reply
0 Kudos
acr
Champion
Champion

Ok, so ive tested with e1000 unfortunately no difference..

Reply
0 Kudos
acr
Champion
Champion

By fixing your speeds, does it fix your problem..

I get horrendous Speeds with two VMs connected with an Internal Switch Only..!! 7MB/s to 9MB/s..

Reply
0 Kudos
oreeh
Immortal
Immortal

some numbers regarding this

Setup:

ESX: HP DL380G3, 8GB, Dual Xeon 3.06, Local VMFS RAID5,

Intel Dual Port (PILA 8472) Adapter (only 1 Port used)

File/NFS/FTP server: P4D 3GHz, 1GB RAM, SATA RAID

Switch: HP2524.

All network cards and switch ports set to 100Full.

1. copy (SMB) from server to virtual XP (vmxnet, tools installed)

3020 PKTTX/s 1.58 MbTX/s

2. FTP from server to virtual BSD VM (vmxnet, tools installed):

2312.66 PKTTX/s 1.17 MbTX/S with 5.46 MB/s

3. FTP from server to virtual BSD VM (e1000, no tools installed)

1549.45 PKTTX/s 0.78 MbTX/S with 7.16MB/s

4. FTP from server to virtual BSD VM (e1000, tools installed)

1549.45 PKTTX/s 0.78 MbTX/S with 7.16MB/s

5. FTP from BSD VM to BSD VM (internal vSwitch) (both e1000, tools installed)

6027.22 PKTTX/s 68.02 MbTX/s with 10.66 MB/s

6. FTP from BSD VM to BSD VM (internal vSwitch) (both vmxnet, tools installed)

6334.75 PKTTX/s 71.51 MbTX/s with 7.96 MB/s

Only the tested VMs were running on top of ESX

The same tests with gigabit (HP5308XL Switch, onboard Broadcom NIC in ESX) had the same results

Reply
0 Kudos
acr
Champion
Champion

Ok, my tests are very similar..

I too have the HP environment.. I have IBM Blades so will run some test on those, just to eliminate or compare..

Reply
0 Kudos
oreeh
Immortal
Immortal

Next I'll setup (I'm nearly done) two physical BSD systems configured exactly like the VMs (even with e1000 cards) and post the results.

My assumption: far better network throughput and no difference regarding transfer direction.

Reply
0 Kudos
acr
Champion
Champion

I did this using the same switches as the Blade, ie plugged the Physicals directly into the Blade Switch, then retested with the Physicals on the same LAN..

We got 60MB to 70MB each time.. FTP or File copy..

From within the VMs we always see very high kernel activity via task manager.. this may or may not be relavant, but at the end of the day the copy takes far to long...

Reply
0 Kudos
JonT
Enthusiast
Enthusiast

This thread also seems similar, but they are exploring it from a different angle I think.

http://www.vmware.com/community/thread.jspa?messageID=540298&#540298

Reply
0 Kudos
acr
Champion
Champion

Excellent Link JonT, i had discoverd it during my intesive testing with the issues i have..

But lots of food for thought..

It would be nice for VMware to comin on this especially linking that post to this purhaps..?

Reply
0 Kudos
oreeh
Immortal
Immortal

Very interesting, thanks JonT

this explains why NFS (using the default UDP) is awfully slow inside of VMs

Reply
0 Kudos
oreeh
Immortal
Immortal

Just did the same with the physical BSD systems

100M network:

10.80 MB/s throughput with FTP pushing files

10.77 MB/s throughput with FTP pulling files

1000M network:

79,78 MB/s throughput with FTP pushing files

80,12 MB/s throughput with FTP pulling files

Reply
0 Kudos
juchestyle
Commander
Commander

By fixing your speeds, does it fix your problem..

I get horrendous Speeds with two VMs connected with

an Internal Switch Only..!! 7MB/s to 9MB/s..

It helped a little, but not to the point it should have!

Respectfully,

Kaizen!
Reply
0 Kudos
acr
Champion
Champion

So between your physical boxes your figures are very similar to mine..

Why are we not getting anything close from the virtuals..?

Is your ESX Patched..?

I have an older unpatched ESX 3 which i'll try...?

Reply
0 Kudos
oreeh
Immortal
Immortal

Why are we not getting anything close from the virtuals..?

grasshopper made an interesting comment on one of the other threads

Based on my first statement above, I'll interpret this to mean 2 VMs on the same vSwitch.

I once assumed that performance would always be better in such a scenario as well.

However I was shocked to find that performance \_could_ actually be worse on the same vSwitch.

It has been documented somewhere, but that was a lot of beers ago.

I'll try and setup two VMs on different vSwithes and see what happens

The ESX is fully patched (besides the architecture patches that don't fit)

Reply
0 Kudos
juchestyle
Commander
Commander

In Case anyone wants to reference our SR, here it is: 374322

Respectfully,

Kaizen!
Reply
0 Kudos
oreeh
Immortal
Immortal

Tried it with VMs on different vSwitches - same results Smiley Sad

Reply
0 Kudos
juchestyle
Commander
Commander

I just had the opportunity to do some tests using the e1000 as defined in the vmx file. The transfer rates were much improved.

Transferring from my laptop which is configured at 100 full, I was able to reach as high as 76 mb transfer rates through a network of three hops. I think that is pretty good, about 16 mb faster than previous when using vmxnet.

Respectfully,

Kaizen!
Reply
0 Kudos
oreeh
Immortal
Immortal

With e1000 I'm able to reach 108 Mbit/s in a 100 full environment

BUT no improvement when staying on the same vSwitch / different vSwitches inside the ESX box - and that bothers me.

Everywhere the fact gets mentioned that transfers inside ESX are nearly bus speed (which to me means I should at least get 1000 MBit/s)

Reply
0 Kudos
oreeh
Immortal
Immortal

just ran another test

I ran iperf inside a VM connecting to itself (via 127.0.0.1) - 215 Mbits/sec

the same test using the real IP - 58.9 Mbits/sec

both far too slow, since the packets never leave the VM !

Same on a physical box 1.39 Gbits/sec regardless of using 127.0.0.1 or the real IP

which leads to the question - do we really have a non-network related problem?

Message was edited by:

oreeh

Reply
0 Kudos