VMware Cloud Community
ACDCDC
Contributor
Contributor

iSCSI storage throughput (benchmark)

Hi All,

I am trying to measure and analyze the correctness of data in freshly built iSCSI storage (no VMs) throughput on vSphere 6.5 (Enterprise Plus license).

Here is a brief description of the hardware

  1. HP Proliant DL385p G8 server with 8 NICs (4 built-in and 4 external)
  2. QNAP NAS TS-EC879U-RP with 6 NICs (2 built-in 4 external)
  3. Cisco 2960s switch

NAS configuration:

  1. There are 4 1TB LUNs on RAID 10 array. They are mapped into 2 iSCSI targets with 2 LUNs per target. Let me call them T0L1, T0L1, T1L0, and T1L1
  2. There is one more LUN mapped to a dedicated target
  3. There are 5 NICs dedicated for iSCSI traffic. All on the same subnet.

Server configuration:

  1. There are two 960 GB SSD datastores
  2. There are two iSCSI datastores: each maps LUNs from different targets. For instance DS_1 is comprised from T0L0 and T1L0, and DS_2 is comprised from T0L1 and T1L1
  3. There are 4 NICs dedicated for iSCSI traffic. All on the same vDS but each linked to different uplink by means of "Teaming and failover" policy
  4. Overall there are 100 paths on iSCSI Software adapter: (5 LUNs on NAS) x (5 NICs on NAS) x (4 VMKernel adapters on server). All configured with Vmware Round Robin policy.

Switch configuration:

  1. A dedicated VLAN for iSCSI traffic that connects all corresponding ports of NAS and ESXi server

Here is the experiment

  1. I created a 10GB file by SSH to ESXi and running the dd command on one of the SSD datastores (dd if=/dev/urandom of=10GB.bin bs=64M count=160).
  2. I first used datastore browser in Web Client (vCenter that spans this particular ESXi) to copy 10GB file from one SSD datastore to another SSD datastore. It took 35 seconds, so I calculated throughput of 10*1024*1024*1000 / 35 = 299,593,143 B/s
  3. I then copied the same file from the first SSD datastore to the first iSCSI datastore. It took 18:29 minutes. So the throughput was 9,455,148 B/s which is way lower than my expectations
  4. Then I copied that same file from the first iSCSI datastore to the second iSCSI datastore and it took 4:42 minutes with the throughput of 37,183,546 B/s (almost 4 times faster than with SSD -> iSCSI)
  5. Finally I copied it from iSCSI datastore to SSD datastore and it took 2:42 minutes with the throughput of 64,726,913 B/s.

Observations:

  1. Experiment 1 (item 2 from the above list) looks fair
  2. Experiment 2 (item 3) looks strange. I observed network traffic on QNAP NAS while copying the file and it appears that only one NIC (out of 5 available on NAS) was used to transfer the data. I would expect something around 70 MB/s instead of 9.4 MB/s that I got.
  3. Experiment 3 (item 4) contradicts my expectation on how iSCSI "Hardware Acceleration" supposed to work. I could be mistaken but I thought that hardware acceleration allows iSCSI to iSCSI data copying without sending data to the ESXi host. Instead I observed that all 5 NICs on the NAS were reading and writing data at the same time with approximate speed of 10 MB/s (illustrating screenshot attached).
  4. Experiment 4 (item 5) looks fair as it shows iSCSI to SSD transfer over a single NIC with the throughput of approx 64MB/s (reasonable for a single 1Gb NIC).

My questions:

  1. Does my storage topology (number of NICs, number of paths, path selection algorithm, etc) look reasonable? Can it be improved having the described resources?
  2. Does my experiment methodology appear reasonable?
  3. Do my metrics and the way how I measured them make sense?
  4. Would a VM migration of equivalent size (instead of 10GB file transfer) show different numbers?
  5. Would a directory transfer containing 100 files of 100MB each make any difference?
  6. Why does Experiment 2 (SSD to iSCSI) show such a low throughput?
  7. Why does "Hardware Acceleration" work in a way where all NAS NICs are being used instead of network-less file copying internally on the NAS?
  8. Any recommendations of a benchmarking methodology for vSphere iSCSI traffic?

Thanks in advance for all your comments.

--

Simon

0 Kudos
2 Replies
parmarr
VMware Employee
VMware Employee

Hello, I see you have the request pending for quite some time. You can reach out to support by filing a SR with them, heres how to do it :

How to file a Support Request in My VMware (2006985) | VMware KB

Sincerely, Rahul Parmar VMware Support Moderator
0 Kudos
Nick_Andreev
Expert
Expert

What I find strange in your observations, is that you saw one NIC being used in test 2 and five NICs in test 3. Could you check if you have 20 paths on each of the LUNs from test 2?

Also in the datastore list confirm what you have in Hardware Acceleration column. Is it Supported, Unsupported or Unknown?

---
If you found my answers helpful please consider marking them as helpful or correct.
VCIX-DCV, VCIX-NV, VCAP-CMA | vExpert '16, '17, '18
Blog: http://niktips.wordpress.com | Twitter: @nick_andreev_au
0 Kudos