This thread is a follow-up to the following threads since these seem to be related:
http://www.vmware.com/community/thread.jspa?threadID=74329
http://www.vmware.com/community/thread.jspa?threadID=75807
http://www.vmware.com/community/thread.jspa?threadID=77075
Description of the issues and "results" we have so far.
juchestyle and sbeaver saw a significant degradation of network throughput on 100 full virtual switches.
The transfer rate never stabilizes and there are significant peaks and valleys when a 650 meg iso file
gets transferred from a physical server to a vm.
Inspired from this I did some short testing with some strange results:
The transfer direction had a significant impact to the transfer speed.
Pushing files from VMs to physical servers was always faster (around 30%) than pulling files from servers.
The assumption that this is related to the behaviour of Windows servers was wrong, since this happened
regardless of the OS and protocol used.
Another interesting result from these tests: e1000 NICs always seem to be 10-20% faster than the vmxnet
and that there is a big difference in PKTTX/s with vmxnet and e1000.
After that acr discovered real bad transfer speeds in a Gigabit VM environment.
The max speed was 7-9 MB/s, even when using ESX internal vSwitches.
A copy from ESX to ESX reached 7-9 MB/s too.
The weird discovery in this scenario: when disabling CDROMs in the VMs the transfer speed goes up to 20 MB/s.
Any ideas regarding this?
I'll mark my question as answered and ask Daryll to lock the thread so we have everything in one thread.
Just a quick response to get on this thread and update:
It seems that the physical switch is probably the culprit here. We opened an SR on this and were told that ESX does not do a good job fluctuating network transfer speeds like physical machines do. If you have a physical switch in the middle (who doesn't) you want to set it to 1000 full and not auto.
Apparently if you have a physical switch that is set to Auto, ESX will talk to it and decide to transfer files at the lowest common denominator to avoid having to scale up and down the transfer rate.
I am not particularly happy with this answer, but it seems to be the best answer we have gotten so far.
Respectfully,
Ok, so ive tested with e1000 unfortunately no difference..
By fixing your speeds, does it fix your problem..
I get horrendous Speeds with two VMs connected with an Internal Switch Only..!! 7MB/s to 9MB/s..
some numbers regarding this
Setup:
ESX: HP DL380G3, 8GB, Dual Xeon 3.06, Local VMFS RAID5,
Intel Dual Port (PILA 8472) Adapter (only 1 Port used)
File/NFS/FTP server: P4D 3GHz, 1GB RAM, SATA RAID
Switch: HP2524.
All network cards and switch ports set to 100Full.
1. copy (SMB) from server to virtual XP (vmxnet, tools installed)
3020 PKTTX/s 1.58 MbTX/s
2. FTP from server to virtual BSD VM (vmxnet, tools installed):
2312.66 PKTTX/s 1.17 MbTX/S with 5.46 MB/s
3. FTP from server to virtual BSD VM (e1000, no tools installed)
1549.45 PKTTX/s 0.78 MbTX/S with 7.16MB/s
4. FTP from server to virtual BSD VM (e1000, tools installed)
1549.45 PKTTX/s 0.78 MbTX/S with 7.16MB/s
5. FTP from BSD VM to BSD VM (internal vSwitch) (both e1000, tools installed)
6027.22 PKTTX/s 68.02 MbTX/s with 10.66 MB/s
6. FTP from BSD VM to BSD VM (internal vSwitch) (both vmxnet, tools installed)
6334.75 PKTTX/s 71.51 MbTX/s with 7.96 MB/s
Only the tested VMs were running on top of ESX
The same tests with gigabit (HP5308XL Switch, onboard Broadcom NIC in ESX) had the same results
Ok, my tests are very similar..
I too have the HP environment.. I have IBM Blades so will run some test on those, just to eliminate or compare..
Next I'll setup (I'm nearly done) two physical BSD systems configured exactly like the VMs (even with e1000 cards) and post the results.
My assumption: far better network throughput and no difference regarding transfer direction.
I did this using the same switches as the Blade, ie plugged the Physicals directly into the Blade Switch, then retested with the Physicals on the same LAN..
We got 60MB to 70MB each time.. FTP or File copy..
From within the VMs we always see very high kernel activity via task manager.. this may or may not be relavant, but at the end of the day the copy takes far to long...
This thread also seems similar, but they are exploring it from a different angle I think.
http://www.vmware.com/community/thread.jspa?messageID=540298򃺊
Excellent Link JonT, i had discoverd it during my intesive testing with the issues i have..
But lots of food for thought..
It would be nice for VMware to comin on this especially linking that post to this purhaps..?
Very interesting, thanks JonT
this explains why NFS (using the default UDP) is awfully slow inside of VMs
Just did the same with the physical BSD systems
100M network:
10.80 MB/s throughput with FTP pushing files
10.77 MB/s throughput with FTP pulling files
1000M network:
79,78 MB/s throughput with FTP pushing files
80,12 MB/s throughput with FTP pulling files
By fixing your speeds, does it fix your problem..
I get horrendous Speeds with two VMs connected with
an Internal Switch Only..!! 7MB/s to 9MB/s..
It helped a little, but not to the point it should have!
Respectfully,
So between your physical boxes your figures are very similar to mine..
Why are we not getting anything close from the virtuals..?
Is your ESX Patched..?
I have an older unpatched ESX 3 which i'll try...?
Why are we not getting anything close from the virtuals..?
grasshopper made an interesting comment on one of the other threads
Based on my first statement above, I'll interpret this to mean 2 VMs on the same vSwitch.
I once assumed that performance would always be better in such a scenario as well.
However I was shocked to find that performance \_could_ actually be worse on the same vSwitch.
It has been documented somewhere, but that was a lot of beers ago.
I'll try and setup two VMs on different vSwithes and see what happens
The ESX is fully patched (besides the architecture patches that don't fit)
In Case anyone wants to reference our SR, here it is: 374322
Respectfully,
Tried it with VMs on different vSwitches - same results
I just had the opportunity to do some tests using the e1000 as defined in the vmx file. The transfer rates were much improved.
Transferring from my laptop which is configured at 100 full, I was able to reach as high as 76 mb transfer rates through a network of three hops. I think that is pretty good, about 16 mb faster than previous when using vmxnet.
Respectfully,
With e1000 I'm able to reach 108 Mbit/s in a 100 full environment
BUT no improvement when staying on the same vSwitch / different vSwitches inside the ESX box - and that bothers me.
Everywhere the fact gets mentioned that transfers inside ESX are nearly bus speed (which to me means I should at least get 1000 MBit/s)
just ran another test
I ran iperf inside a VM connecting to itself (via 127.0.0.1) - 215 Mbits/sec
the same test using the real IP - 58.9 Mbits/sec
both far too slow, since the packets never leave the VM !
Same on a physical box 1.39 Gbits/sec regardless of using 127.0.0.1 or the real IP
which leads to the question - do we really have a non-network related problem?
Message was edited by:
oreeh