I have an ESXi server with an Intel X520-DA2 10 gig adapter in it. It has an iSCSI data store connected over one port and VM traffic over the other port. The iSCSI speed just couldn't be better but the problem seems to be that none of my VM's will do over 300 megabits/sec. Their all using VMXNET3 adapters. I have gone so far as to hook the second 10gig port directly up to another standalone windows server to eliminate the network/switch as a bottleneck via an SFP+ cable, but am still limited to the ~300 megabit ceiling. Any clues to what could be causing this? Thanks in advance!
So, nobody else has had this issue?
I do have the exact same issue under RHEL6. I don't have that problem with RHEL5. No clue so far.
Did you engaged support on the issue ?
I'm assuming that since I am using the free version I cannot open up a support case.
Did you use NetPerf to analyse? You did a crossover connection to another server to test the speed? How about QOS?
QOS uninstalled. Directly connected SFP+ to another server, still hitting a 300 megabit ceiling. Have not used netperf, just using resource mon to look at the speed.
Hi,
In your post there's no mention what type of test your runningt. Since you mentioned Resource Mon, I suspect you're simply copying large files to measure network throughput...if that's the case then your disk subsystem could be the culprit...
Iperf is a good tool to measure networking throughput between two servers without doing any i/o.
Peter D
I am copying files. All of the copying is between raid arrays or SSD's. The VMXNET3 adapter is creating the bottleneck.
Please use Iperf like mentioned. Single SSDs are not fast enough. RAID10 with 7k4 disks tend to be around 300MB/s as well. Another bottleneck: Having RAID+NIC on the same PCI-Channel.
Ingo
Here is a nifty Intel Case Study paper that explains similar results with their tests done in conjunction with FedEx.
ftp://download.intel.com/support/network/sb/fedexcasestudyfinal.pdf
The gist of it is this:
You may find that you get better performance with single stream file copies in VMs with 1 vCPU rather than a VM with more than 1 vCPU.
However, YMMV, as it depends on how you've got things setup. They also list some other best practices to keep in mind (see pg. 10). They specifically refer to vSphere 4.0, but I'm sure the majority of them (if not all) still apply with vSphere 5.
"Please use Iperf like mentioned. Single SSDs are not fast enough. RAID10 with 7k4 disks tend to be around 300MB/s as well. Another bottleneck: Having RAID+NIC on the same PCI-Channel.
Ingo"
Ingo, an Intel 510 series SSD can certainly swallow more than 2 gigabits per second. The VM is only putting out 300 megabits. I'll run Iperf and let you know what I see. Thanks!
You found out yourself why you fail with your tests, a single SSD cannot give a sustained output to occupy a 10G connect. Its all about writing. 300MB/s is the best you can get when the SSD is empty. So you are just measuring the writespeed of your SSD. I used SSDs and EFDs for testing but the bottleneck was always the RAID-Controller or the SAN-Connection.
Ingo
Here are the iperf results between two VM's.
------------------------------------------------------------
Client connecting to 192.168.1.12, UDP port 5123
Sending 1470 byte datagrams
UDP buffer size: 64.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.1.13 port 62071 connected with 192.168.1.12 port 5123
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 5.0 sec 305 MBytes 512 Mbits/sec
[ 3] 5.0-10.0 sec 313 MBytes 525 Mbits/sec
[ 3] 10.0-15.0 sec 311 MBytes 523 Mbits/sec
[ 3] 15.0-20.0 sec 312 MBytes 524 Mbits/sec
[ 3] 20.0-25.0 sec 310 MBytes 521 Mbits/sec
[ 3] 25.0-30.0 sec 312 MBytes 524 Mbits/sec
[ 3] 0.0-30.0 sec 1.82 GBytes 521 Mbits/sec
[ 3] Sent 1329616 datagrams
[ 3] Server Report:
[ 3] 0.0-30.0 sec 1.79 GBytes 512 Mbits/sec 0.000 ms 22806/1329615 (1.7%)
[ 3] 0.0-30.0 sec 1 datagrams received out-of-order
Why would the bandwidth between two VM's internally be limited to 500 megabits?
And here it is connected to a physical box, still limited to 500 megabits:
------------------------------------------------------------
Client connecting to 192.168.1.121, UDP port 5123
Sending 1470 byte datagrams
UDP buffer size: 64.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.1.13 port 62072 connected with 192.168.1.121 port 5123
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 5.0 sec 302 MBytes 507 Mbits/sec
[ 3] 5.0-10.0 sec 313 MBytes 525 Mbits/sec
[ 3] 10.0-15.0 sec 311 MBytes 522 Mbits/sec
[ 3] 15.0-20.0 sec 313 MBytes 525 Mbits/sec
[ 3] 20.0-25.0 sec 311 MBytes 521 Mbits/sec
[ 3] 25.0-30.0 sec 313 MBytes 524 Mbits/sec
[ 3] 0.0-30.0 sec 1.82 GBytes 521 Mbits/sec
[ 3] Sent 1328524 datagrams
[ 3] Server Report:
[ 3] 0.0-30.0 sec 1.81 GBytes 518 Mbits/sec 0.751 ms 5971/1328523 (0.45%)
[ 3] 0.0-30.0 sec 1 datagrams received out-of-order
Any ideas?
Guys,
Let's standardize on megabits (Mb) or megabytes (MB) in this posting 😉
The single SSD performance ranges from 100MB/s to 500MB/s; quickly googling for Intel 510 series SSD tells me read speeds up to 500 megabytes per second (MB/s) and sequential writes up to 315 MB/s. If you intial post talks about megabytes then you were effectively maxing out what SSD can provide.
Now, with iperf getting ~ 500Mb/s is somewhat dissapointing and you should determine if the problem lies within OS or further down the networking chain. With that being said, can you test it with two VMs being on the same host? When you do that, the iperf traffic should not traverse pNIC, but rather stay within the same vswitch two test VMs are connected to and, as such, you should be getting much better results. Can you confirm that?
Also, have you ever tested what you'd get running iperf between two xover'd physical boxes with the same pNICs? My suspicious is that maybe iperf does not work well with 10GB/s ethernet and running such test could possibly verify that.
Also, although I'm sure you had already verified that, can you confirm that there's no ingress/outgress traffic shaping policies or network i/o control configured?
Peter D.
Thanks Peter, I have been talking only megabits and gigabits since the first post. Ingo has them mixed up.
The first set of results above are VM traffic on the same host, on the same vswitch. I am wondering if maybe there is an artificial bottleneck in ESXi 5?
I have no traffic shaping policies set up, nor network i/o control (I have no vnetwork distributed switch)
This is really frustrating me....
What OS are you testing under?
If Linux, please review this:
Although the document states it "does not affect ESXi5" it certainly does in our environment. Try this on both servers as per the guide, and rerun iperf:
ethtool -K eth0 lro off
I feel your pain.
Again, can you test between two physical servers having the same network card as in ESX host xover'd and test with iperf. This way you'll be able to tell whether iperf can produce adequate results for 10Gb/s network and whether the issue is only when ESXi is in the equation. If you can get throughput significantly higher than 500Mb/s then compare NIC properties between VM's OS and the physical server. For instance, compare Receive Side Scaling setting; I believe VMXNET3 does not enable it on vNIC by default so compare it with the physical. Other properites to compare: TCP offloading, Receive/Transmit Buffers.
Also: 1) Do you have jumbo frames enabled with your testing? 2) Do you have VMware Tools on test VM matching ESXi5 host (I'm not sure if vmxnet driver version in 4.x Tools is the same as for 5.0...)
Peter D.
The operating systems are Windows7 and Server2008 R2.
I loaded Win7 on the ESXi hardware and tested with a 2008r2 server and got spectacular speeds. I am absolutely sure the hardware is not a problem. Since I am seeing a traffic bottleneck between VM's on the same host, on the same vswitch, I'm thinking its an ESXi problem. The ESXi host has stock network settings as do the VM's. I will do some experimentation with frame sizes and TCP offloading. I have VMware tools 8.65 on both machines.