iSCSI round-robin multipath policy -- any sane val...

sbarnhart · ‎11-07-2012

Has anyone experimented with iSCSI round-robin multipath policy changes besides switching the value to 1 IOP instead of the default of 1000?

Pretty much all of what I've read suggests just using 1 IOP. While this results in a 1:1 balance across two NIC paths, is there any reason to consider a different value?

While 1000 seems too high for some reason (which is why people change it), is 1 too low? Is there perhaps some performance advantage to a value of 100 or 200? I can't specificly say why, but it seems reasonable to believe that there may be some reason to not invoke RR until a higher threshold has been reached, but lower than the default of 1000.

Has anyone experimented and seen any difference? I did some basic benchmarks with an Equallogic PS4000 and frankly didn't see any difference between 1000 and 1 but I do see the NIC workloads evened out at 1.

And is there some rationale for 1000 being the default? I can see some basic math of 1k average payloads * 1000 IOPS equalling roughly 1 Gbit/sec and thus an "overflow" threshold for a single NIC.

It'd sure be easier if the RR policy was settable in the GUI, too, but maybe like jumbo frames and RR itself it will be in a future version.

snowmizer · ‎11-08-2012

I can't speak about iSCSI but I just had an issue with the round-robin policy on my FC SAN. We had a job that was running 5+ hours on our new array (was 4 hours on our old array). I modified this setting to 100 on the affected LUNs and rebooted the VM where the job was running. After making this change but prior to the reboot when I ran the job I saw a dramatic increase in IOPS. Once I rebooted the VM the job went down to about 3 hours 27 minutes.

Bottom line this does make a difference. 100 may be a good starting point but your mileage may vary.

Hope this helps a bit.

sbarnhart · ‎11-08-2012

I guess without a better understanding of the SWiSCSI architecture of ESXi, it's kind of hard to know. There are plenty of blogs from people claiming big increases in throughput with --iops 1, but thusfar I haven't been able to see it at the two sites I've changed the RR policy to 1 against Equallogic PS4000s.

I can see where "too small" may result in inefficiency or latency as a result of requiring greater context switches or splitting data flows across paths, especially when AFAIK you really can't manually tune iSCSI paths to reflect actual data paths. For example, if you have an ESXi host with two NICs plugged into two different switches and a SAN like a PS4000 with two iSCSI NICs also split between those two switches, you can't make ESX NIC 1 on switch 1 only talk to SAN NIC 1 on the same switch.

If you could force 1:1 RR AND the specific hardware pathing, I can see where you would get some small advantage due to not passing data between switches (ie, where ESX NIC 1 and SAN NIC 2 have a transacation).

With fiber channel RR you'd probably be splitting your paths across seperate FC fabrics for redundancy anyway, so you'd eliminate that cross-switch traffic you can see with iSCSI path splitting.

MKguy · ‎11-08-2012

In this situation you can only test, test and test with various values. Unless you have detailed recommendations from your vendors (even if there are, don't trust them blindly and confirm them in your environment) that's pretty much the only reliable method to find out.

Also, keep in mind there might be some other restraining factor that won't let you notice the benefit of one setting or the other (like you described, it doesn't seem to make a difference for you whether you set it to 1 or 1000).

And is there some rationale for 1000 being the default? I can see some basic math of 1k average payloads * 1000 IOPS equalling roughly 1 Gbit/sec and thus an "overflow" threshold for a single NIC.

The default number 1000 seems more or less arbitrary, but it's at least a small safeguard to prevent frequent path thrashing or such if your array can't properly handle a LUN being accessed via multiple controllers simultaneously. A single IO can be as tiny as 512 byte, or even as large as 32MB: http://kb.vmware.com/kb/1003469 (and that's not even the SCSI maximum).

So in theory, anything between 4 simultaneous IOs or tens of thousands IOs could saturate a Gbit link. For the record, in my experience the average IO size in a mainly Windows environment is roughly somewhere between 12-32KiB.

In your iSCSI case however, there is also another approach you can take for round-robin multipathing instead of an IOPS:

Switch paths not by IOs, but by frame size if you use jumbo frames. I've implemented this myself too and saw not huge, but respectable improvements through this approach. This is well described here:

http://blog.dave.vc/2011/07/esx-iscsi-round-robin-mpio-multipath-io.html

-- http://alpacapowered.wordpress.com

sbarnhart · ‎11-08-2012

Thanks, that's fascinating. I didn't know PSPs could be byte based vs. IOP based.

For what it's worth, --iops 1 does result in fairly even balancing when measured on a bytes basis. When I look at the vmnic performance stats over a 24 hour period for an --iops 1 PSP host, the throughput is balanced within 1/2 Kbyte between the two vmnics for both read and write.

I'll have to try the PSP using bytes and see if I notice any specific difference.

All

iSCSI round-robin multipath policy -- any sane values besides 1 or the default of 1000?