VMware Cloud Community
1ppInc
Contributor
Contributor

Slow iSCSI Performance - Nimble Storage CS220

Hi!

I have a new environment setup.

  • 2x HP DL560 G8's with 2 8-core Xeons and 128GB of RAM, I have 4 NIC ports dedicated to iSCSI traffic, they are connected to Brocade enterprise class 1GBe switches.

  • 1x Nimble CS220 12TB iSCSI unit with 4 Active NICS and 4 failover NICs

I have configured ESXi 5.1 with MPIO and Round Robin enabled, I have tried settings the IOPS to 0, 1, 4 and default of 1000, neither option makes any difference to the throughput. As far as I can tell, the traffic is being distributed evenly across the NICs at about 22.8MBps or roughly 90MBps combined.

I am told by the vendor that we have configured the unit to best practice standards and that nothing is wrong, but the throughput number we are saying are WAY off from what they would normally expect.

During a speed test using CrystalDiskMark, we are seeing the following:

Sequential Read :   108.458 MB/s
          Sequential Write :   100.017 MB/s
         Random Read 512KB :    59.763 MB/s
        Random Write 512KB :    88.862 MB/s
    Random Read 4KB (QD=1) :     8.711 MB/s [  2126.6 IOPS]
   Random Write 4KB (QD=1) :     6.916 MB/s [  1688.4 IOPS]
   Random Read 4KB (QD=32) :   164.283 MB/s [ 40108.0 IOPS]
  Random Write 4KB (QD=32) :    72.232 MB/s [ 17634.8 IOPS]

  Test : 1000 MB [C: 58.5% (23.2/39.7 GB)] (x5)
  Date : 2012/12/05 14:53:10
    OS : Windows Server 2012 Server Standard Edition (full installation) [6.2 Build 9200] (x64)

The Sequential Read and Write are, by their standards, about 1/4th of what they would expect to see.

Here is the kicker! When I use the iSCSI Initiator with MPIO enabled on my Windows box, on the SAME network, I am seeing increased throughput with even fewer paths connected, except with slightly slower Random Reads/Writes.

Sequential Read :   188.052 MB/s
          Sequential Write :   128.094 MB/s
         Random Read 512KB :    64.522 MB/s
        Random Write 512KB :    91.783 MB/s
    Random Read 4KB (QD=1) :     9.009 MB/s [  2199.4 IOPS]
   Random Write 4KB (QD=1) :     5.352 MB/s [  1306.7 IOPS]
   Random Read 4KB (QD=32) :   110.950 MB/s [ 27087.4 IOPS]
  Random Write 4KB (QD=32) :    65.868 MB/s [ 16081.0 IOPS]

  Test : 1000 MB [E: 0.3% (0.1/30.0 GB)] (x5)
  Date : 2012/12/05 14:38:20
    OS : Windows Server 2012 Server Standard Edition (full installation) [6.2 Build 9200] (x64)

I am at a loss and Nimble Storage have spoken with me and troubleshot this thing and can't figure out what could be wrong.

Any input or suggestions would be awesome!

Thanks!

0 Kudos
11 Replies
Josh26
Virtuoso
Virtuoso

Hi,

Standard questions..

Software initiator, or hardware HBAs? Numerous people have found the software initiator to be superior.

Flow Control?

Jumbo Frames?

Maybe try it with just two connected NICs. The Nimble itself has four. It's not common for each server to be able to top out the bandwidth all the way to the SAN. Two is a more common config and probably one that is better tested.

Does Nimble have a published SATP and PSP preference?

I'd hate to say it but how about trying Windows 2008? 2012 is still pretty fresh and it would surprise me if they had all the bugs out - remember "Service Pack 1" for the last three Windows revisions has been a "major performance fix".

0 Kudos
1ppInc
Contributor
Contributor

Sorry! I thought I gave a lot of info but I definitely missed some key items.

VMWare Software iSCSI Initiator

No Flow Control enabled

No Jumbo frames configured on either the Nimble or the ESXi Hosts (But Jumbo Frames enabled on the switches)

I can try with 2 NICs tomorrow.

Not sure if Nimble has any preference, but I had one of their techs in my office to do the initial setup, I have to imagine they followed their best practices.

I will be deploying a Windows 2008 server tomorrow, I'll report back if there is any difference in disk performance.  If this is the case, I'll be pretty bummed, but will anxiously await updates!

Thanks for the input.

0 Kudos
blit_tech
Contributor
Contributor

I'm getting ready to setup a CS210 in our environment, so I don't know if this will help, but Nimble has a best practice setup guide called "VMWare vSphere 5 on Nimble Storage". In that document there is a section that states:

The physical switch ports in which the ESX and Nimble array interfaces are connected must have flow control enabled. Failure to do so can cause TCP level packet retransmits or iSCSI-level abort tasks.

You mentioned that you don't have flow control enabled on your switches. Hopefully this helps you out.

I'm hoping to get my units setup in the next week or so. Hopefully ours will go smoothly.

0 Kudos
vPatrickS
Enthusiast
Enthusiast

never mind, brain lag. :smileyblush:

0 Kudos
MKguy
Virtuoso
Virtuoso

He already mentioned that he used different IOPS settings in his first post, so this shouldn't be the issue.

Please use the well known IOmeter benchmarking tool and compare these numbers with what others in a similar configuarion have posted here:

http://communities.vmware.com/thread/197844?start=0&tstart=0

http://vmktree.org/iometer/

An important metric missing in the numbers you posted is latency.

Anyways, you also shouldn't mind sequential maximum throughput performance numbers too much, those are generally not realistic workloads (as are 512KB IOs). In my opinion it's actually quite impressive that you have noticeably better numbers on your random access patterns.

It also could be possible that all the iSCSI initiator NICs connect to the same target IP on your Nimble storage, limiting the effective throughput to this one path on the storage side. I don't know how your storage or network is setup, much less how Nimble iSCSI storage works in detail, but this could explain why your maximum throughput never exceeds what a single 1GbE link can offer. You could also try to disable RR and switch to MRU, keeping a single path on your ESXi host too and see if that will cause any significant decrease in performance.

The Windows MPIO may be able to connect to different targets/IPs at once, making full use of all available paths.

-- http://alpacapowered.wordpress.com
0 Kudos
1ppInc
Contributor
Contributor

Hey guys,

Wanted to provide an update on this issue. After spending a few hours with Nimble support, we finally got to the bottom of it.

We had to ensure we were applying the Round Robin properties to ALL presented iSCSI devices, rather than just the one volume we were targetting for speed testing purposes.

We had additionally tested with SQLio and found nearly identical results to CrystalDiskMark. After the updated policies on all paths, we were seeing 400+Mbps for sequential reads/writes using SQLio, but still ~100Mbps using CrystalDiskMark.  We attributed this to the fact that CDM uses a single queue depth for their sequential tests causing deflated and non-real world like numbers.

The following was executed on each of the ESXi hosts to apply appropriate policies:

i=`esxcli storage nmp device list | awk '/Nimble iSCSI Disk/{print $7}' | sed -e 's/(//' -e 's/)//'`
for p in $i; do esxcli storage nmp psp roundrobin deviceconfig set -d $p --iops 5 --type=iops;done
for p in $i; do esxcli storage nmp psp roundrobin deviceconfig get -d $p ;done

Hope this helps! If anyone has any additional questions, feel free to PM me!

Drew

0 Kudos
vPatrickS
Enthusiast
Enthusiast

Hi

Can you provide us with some details like which block size was beeing used in SQLio?

Regards

Patrick

0 Kudos
1ppInc
Contributor
Contributor

Formatted the Disk with 8k blocks and tested 64k blocks using SQLio

0 Kudos
Timwhatmough
Contributor
Contributor

We have an alomst identical issue with ESXi 5.1 & Nimble cs220's - slow iscsi performance and high latency 400ms +

We spent past 48 hrs doing a loop between Nimble, Cisco and finally vmware. Each saying it was not their issue.

Right now we are looking at a Physical windows 2008 server mounted iscsi to a nimble volume. As we see blazing performance.

If you have any ideas we would love to know.

Thx

0 Kudos
daveywave
Contributor
Contributor

Agreed we are going through the exact same problem but we already have roundrobin on every vm host. We are also utilizing 10gb links and with a Windows box directly attached via iSCSI Initiator we see upwards 400-600MB/sec (through the virtual machine no less) but we are struggling to see anything over 100MB/sec on the actual VMDK disk.

We tried to tweak the iops values, traffic shaping, etc. but we could achieve no more bandwidth.

We also are using 2 switches (Force10 s4810) with 4 active paths on each host

0 Kudos
Josh26
Virtuoso
Virtuoso

As I often say here, the problem with hijacking an old thread is polluting yours with misinformation.

In the case of the original poster, the issue was identified as not using RR on every host. You say that's not your problem, and discuss a fairly different hardware setup.

0 Kudos