Hi,
I'm unhappy with the throughput of my iscsi datastore in my vsphere environment.
The setup (in short):
- Dell N4032 10 gbit switches (all jumbo frames etc)
- New esxi 6.0 servers + vcenter 6
- Megaraid 9266-8i raid controller with 8 x 4 TB sas disks in RAID10.
- Ubuntu 14.04 on the storage server (physical machine) as the iscsi target
I've done some testing to establish a non-vmware baseline:
A: writing to local storage (raid array): 830 MB/s
B: mounted the iscsi target on other physical linux machine: 825 MB/s
C: mounted the iscsi target in a VM with vmxnet3 NIC: 500 MB/s
😧 set up the iscsi target as VMFS5 datastore and then added a virtual disk to my VM and test speed: 260 MB/S
For C I expected somewhat the same performance as B, but already lose 325 MB/s there which seems to much to be only the virtualization overhead ? The same, default, iscsi settings were used.
For 😧 this is just too slow, only 1/3 of the speed I'm getting compared to test B.
Where do I start debugging. Did I forget settings ?
The esxi's and storage server were only hosting the test machine, so no other workload was active.
Tested the network throughput in the VM with iperf and getting the maximum network speed: 9.90 gbit/s.
Any ideas how to work this out and get decent performance ?
Best regards,
Nick
Hi Nick,
what kind of vmdk did you create?
As long as you didn't created an eagerzeroedthick vmdk your tests inside a VM might not show the correct values (depending on the used tool to test).
This is caused by the fact that when your test tries to write to a track which wasn't initialized prior, ESXi will inject a zero out IO in front of your test IO.
This activity is done under the cover, so your test tool running inside a VM isn't aware of that extra IO.
So as your test IO will last longer as he has to wait for the zero out IO, your VM tests will report the wrong throughput and IO response times.
Regards,
Ralf
From my point of view your performance does not look that bad. So i don´t think you have a real problem there (you might check your switches diags to get this intel regarding let´s say jabber, delayed ack, dcb/flowcontrol). You might want to check by just for testing disable jumbo frames on one esxi machine and do your benchmarks again.
Best,
Joerg
Hi,
This article may give some points to you.
See also
nickdd -- Out of curiosity did you ever get this figured out? We are seeing the same issue within our environment, specifically with iSCSI. If we use FC then we do not see the same discrepancy from Bare Metal to ESXi VMs as we do when using iSCSI.
It sounds to me like there is something within the ESXi host networking stack that is significantly limiting I/O bandwidth capability. Seeing such a disparity does not make me believe that the performance is okay.
Thanks!
hi PureJhop,
I didn't look into it anymore. performance was adequate and it's hard to retest because I only have 1 environment and it's used for production stuff.
Can't compare to FC either.
I'll try to rerun a benchmark somewhere at night when it's slow on the servers.
if you find anything, let me know !
Hello nickdd -- I believe we are making some progress here.
The answer isn't quite what I expected; MTU. If we look below here are the results:
1500 MTU:
File Copy within Windows VM: 50MB/s - 120MB/s
IOmeter Test (75% Read, 25% Write at 32k IO using a single thread): 250MB/s - 350MB/s
iperf Results:
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 242 MBytes 2.03 Gbits/sec
[ 3] 1.0- 2.0 sec 223 MBytes 1.87 Gbits/sec
[ 3] 2.0- 3.0 sec 238 MBytes 1.99 Gbits/sec
[ 3] 3.0- 4.0 sec 240 MBytes 2.01 Gbits/sec
[ 3] 4.0- 5.0 sec 254 MBytes 2.13 Gbits/sec
[ 3] 5.0- 6.0 sec 259 MBytes 2.18 Gbits/sec
[ 3] 6.0- 7.0 sec 275 MBytes 2.31 Gbits/sec
[ 3] 7.0- 8.0 sec 256 MBytes 2.15 Gbits/sec
[ 3] 8.0- 9.0 sec 277 MBytes 2.32 Gbits/sec
[ 3] 9.0-10.0 sec 267 MBytes 2.24 Gbits/sec
[ 3] 0.0-10.0 sec 2.47 GBytes 2.12 Gbits/sec
9000 MTU:
File Copy within Windows VM: 300MB/s - 500MB/s
IOmeter Test (75% Read, 25% Write at 32k IO using a single thread): 800MB/s - 1.2GB/s
iperf Results:
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 1.10 GBytes 9.48 Gbits/sec
[ 3] 1.0- 2.0 sec 1.12 GBytes 9.61 Gbits/sec
[ 3] 2.0- 3.0 sec 1.08 GBytes 9.29 Gbits/sec
[ 3] 3.0- 4.0 sec 1.12 GBytes 9.62 Gbits/sec
[ 3] 4.0- 5.0 sec 1.13 GBytes 9.72 Gbits/sec
[ 3] 5.0- 6.0 sec 1.12 GBytes 9.60 Gbits/sec
[ 3] 6.0- 7.0 sec 1.12 GBytes 9.64 Gbits/sec
[ 3] 7.0- 8.0 sec 1.12 GBytes 9.64 Gbits/sec
[ 3] 8.0- 9.0 sec 1.12 GBytes 9.63 Gbits/sec
[ 3] 0.0-10.0 sec 11.2 GBytes 9.58 Gbits/sec
I am not sure I have ever witnessed MTU reporting such a drastic difference in performance, in fact (depending on load) I have seen MTU increase decrease performance.
I am guessing it has to do something with how the hypervisor is assembling the packets and sending them to our FlashArray. I am continuing to investigate and will let you know once I have more information.
-jhop