VMware Cloud Community
nickdd
Contributor
Contributor

Slow iscsi performance: where do I start debugging ?

Hi,

I'm unhappy with the throughput of my iscsi datastore in my vsphere environment.

The setup (in short):

- Dell N4032 10 gbit switches (all jumbo frames etc)

- New esxi 6.0 servers + vcenter 6

- Megaraid 9266-8i raid controller with 8 x 4 TB sas disks in RAID10.

- Ubuntu 14.04 on the storage server (physical machine) as the iscsi target

I've done some testing to establish a non-vmware baseline:

A: writing to local storage (raid array):  830 MB/s

B: mounted the iscsi target on other physical linux machine:  825 MB/s

C: mounted the iscsi target in a VM with vmxnet3 NIC:  500 MB/s

😧 set up the iscsi target as VMFS5 datastore and then added a virtual disk to my VM and test speed:  260 MB/S

For C I expected somewhat the same performance as B, but already lose 325 MB/s there which seems to much to be only the virtualization overhead ? The same, default, iscsi settings were used.

For 😧 this is just too slow, only 1/3 of the speed I'm getting compared to test B.

Where do I start debugging. Did I forget settings ?

The esxi's and storage server were only hosting the test machine, so no other workload was active.

Tested the network throughput in the VM with iperf and getting the maximum network speed: 9.90 gbit/s.

Any ideas how to work this out and get decent performance ?

Best regards,

Nick

Reply
0 Kudos
7 Replies
kastlr
Expert
Expert

Hi Nick,

what kind of vmdk did you create?

As long as you didn't created an eagerzeroedthick vmdk your tests inside a VM might not show the correct values (depending on the used tool to test).

This is caused by the fact that when your test tries to write to a track which wasn't initialized prior, ESXi will inject a zero out IO in front of your test IO.

This activity is done under the cover, so your test tool running inside a VM isn't aware of that extra IO.

So as your test IO will last longer as he has to wait for the zero out IO, your VM tests will report the wrong throughput and IO response times.

Regards,

Ralf


Hope this helps a bit.
Greetings from Germany. (CEST)
Reply
0 Kudos
joergriether
Hot Shot
Hot Shot

From my point of view your performance does not look that bad. So i don´t think you have a real problem there (you might check your switches diags to get this intel regarding let´s say jabber, delayed ack, dcb/flowcontrol). You might want to check by just for testing disable jumbo frames on one esxi machine and do your benchmarks again.

Best,

Joerg

Reply
0 Kudos
mohdhanifk
Enthusiast
Enthusiast

Reply
0 Kudos
mohdhanifk
Enthusiast
Enthusiast

Reply
0 Kudos
Jhop1
Contributor
Contributor

nickdd‌ -- Out of curiosity did you ever get this figured out? We are seeing the same issue within our environment, specifically with iSCSI. If we use FC then we do not see the same discrepancy from Bare Metal to ESXi VMs as we do when using iSCSI.

It sounds to me like there is something within the ESXi host networking stack that is significantly limiting I/O bandwidth capability. Seeing such a disparity does not make me believe that the performance is okay.

Thanks!

Reply
0 Kudos
nickdd
Contributor
Contributor

hi PureJhop‌,

I didn't look into it anymore. performance was adequate and it's hard to retest because I only have 1 environment and it's used for production stuff.

Can't compare to FC either.

I'll try to rerun a benchmark somewhere at night when it's slow on the servers.

if you find anything, let me know ! Smiley Happy

Reply
0 Kudos
Jhop1
Contributor
Contributor

Hello nickdd‌ -- I believe we are making some progress here.

The answer isn't quite what I expected; MTU. If we look below here are the results:

1500 MTU:

File Copy within Windows VM: 50MB/s - 120MB/s

IOmeter Test (75% Read, 25% Write at 32k IO using a single thread): 250MB/s - 350MB/s

iperf Results:

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0- 1.0 sec   242 MBytes  2.03 Gbits/sec

[  3]  1.0- 2.0 sec   223 MBytes  1.87 Gbits/sec

[  3]  2.0- 3.0 sec   238 MBytes  1.99 Gbits/sec

[  3]  3.0- 4.0 sec   240 MBytes  2.01 Gbits/sec

[  3]  4.0- 5.0 sec   254 MBytes  2.13 Gbits/sec

[  3]  5.0- 6.0 sec   259 MBytes  2.18 Gbits/sec

[  3]  6.0- 7.0 sec   275 MBytes  2.31 Gbits/sec

[  3]  7.0- 8.0 sec   256 MBytes  2.15 Gbits/sec

[  3]  8.0- 9.0 sec   277 MBytes  2.32 Gbits/sec

[  3]  9.0-10.0 sec   267 MBytes  2.24 Gbits/sec

[  3]  0.0-10.0 sec  2.47 GBytes  2.12 Gbits/sec

9000 MTU:

File Copy within Windows VM: 300MB/s - 500MB/s

IOmeter Test (75% Read, 25% Write at 32k IO using a single thread): 800MB/s - 1.2GB/s

iperf Results:


[ ID] Interval       Transfer     Bandwidth

[  3]  0.0- 1.0 sec  1.10 GBytes  9.48 Gbits/sec

[  3]  1.0- 2.0 sec  1.12 GBytes  9.61 Gbits/sec

[  3]  2.0- 3.0 sec  1.08 GBytes  9.29 Gbits/sec

[  3]  3.0- 4.0 sec  1.12 GBytes  9.62 Gbits/sec

[  3]  4.0- 5.0 sec  1.13 GBytes  9.72 Gbits/sec

[  3]  5.0- 6.0 sec  1.12 GBytes  9.60 Gbits/sec

[  3]  6.0- 7.0 sec  1.12 GBytes  9.64 Gbits/sec

[  3]  7.0- 8.0 sec  1.12 GBytes  9.64 Gbits/sec

[  3]  8.0- 9.0 sec  1.12 GBytes  9.63 Gbits/sec

[  3]  0.0-10.0 sec  11.2 GBytes  9.58 Gbits/sec

I am not sure I have ever witnessed MTU reporting such a drastic difference in performance, in fact (depending on load) I have seen MTU increase decrease performance.

I am guessing it has to do something with how the hypervisor is assembling the packets and sending them to our FlashArray. I am continuing to investigate and will let you know once I have more information.

-jhop

Reply
0 Kudos