Having what appears to be an odd issue. The most easily observable symptom is copying approximately 12GB of data from one virtual disk drive to a second virtual disk drive. This process takes roughly 10 - 12 minutes to complete. The exact same copy process on an older physical server with a direct-attached array with (9) 10k disks takes about 2 minutes. This file copy consists of mostly smaller (1mb or less) files.
The VM disk files live on a Compellent array with (12) 15k disks attached to the ESX 4.1 host via 1GB iSCSI. The ESX host connects to the Compellent SAN with a dual port QLogic iSCSI HBA. Each port of the server HBA is connected through its own Cisco switch (i.e. two fault domains) and the volume has a total of 4 paths to the data store where the disk files live (configured in a Round Robin multi-path policy).
I've done a little background work and have monitored the copy process using a combination of the Compellent GUI, the vCenter GUI and ESXTOP. According to the Compellent system, I/O maxes out at around 400 IOPS during the process and is reading/writing about 100,000kbps. The vCenter GUI shows almost the same performance stats. ESXTOP stats look okay too with QUED remaining consistently at 0 and DAVG consistently running between 10 and 15ms.
It's odd to me that though there appears to be no bottleneck, I'm only getting about 400-500 IOPS out of a SAN which should produce something more in the 1800-2000 range. Could there be something within ESX which is throttling I/O back? A setting in the switches I should look at? Two things I have not done are to enable Jumbo Frames or to adjust the Queue Depth setting on the ESX server's HBA. Wanted a little more information before going down that road.
Any input would be appreciated.
Thanks,
Kenny
Since queue length is constantly zero there is no need to adjust it.
Please be aware that in order to enable Jumbo Frames you will need to recreate VMKnic for iSCSI. The frame size is set when you create VMKnic, you can't change it once it is created.
I would also try to change PSP mode to MRU and compare the results.
Took me a couple of days to try your suggestion... changing the multipath policy had no effect. Didn't seem to care if it was Round Robin, MRU, or Fixed the copy speeds were identical. And just to clarify, since I'm using an iSCSI HBA, there would be no Vmknic to recreate. I should be able to enable jumbo frames there, on the switch and at the SAN.
Sorry, missed the part about iSCSI HBA in your first message.
Unfortunately, jumbo frames are not supported on independent iSCSI adapters.
http://www.vmware.com/pdf/vsphere4/r41/vsp_41_iscsi_san_cfg.pdf - check page 42.
btw, did you mean 100,000 kbyte/s or kbit/s?
I would also try to monitor counters in guest OS, this link is quite handy - http://blogs.technet.com/b/cotw/archive/2009/03/18/analyzing-storage-performance.aspx
Can you connect physical server to iSCSI storage and compare performance?
Sure, all my suggestions can take time, but I am just suggesting the things I would try myself to decrease root cause domain.
100,000 kbyte/s....
I can try connecting up a physical server though it may take a few days to do so.
then I guess you reach 1Gbit iSCSI HBA bandwidth limit because 100,000 kbyte/s is roughly 1 Gbit/s
But now we have anohter question - with 2 iSCSI HBAs and with proper load balancing you should be close to 200,000 kbyte/s.
can you check if your HBAs are equally loaded during the copy?
Good point... I'll check that. To be clear, it is actually a single HBA with dual ports... each port going through a separate switch. That raises a question for me however. Though I have each HBA port configured with a path to the storage (i.e multipathing), I don't see a way to control load balancing. What am I missing?
unfortunately I have never had experience with independent iSCSI HBAs so I might be wrong with my ideas and assumptions.
Do you see your LUNs on both ports? How many paths do you see in ESXi per LUN?
so, your HBA is bottleneck. Now you need to check in esxtop how the traffic is load balanced evenly between two ports.
btw, did you check your HBA specs? what is the maximum throughput for the entire card by documentation?
Hmmm you could be right... if I'm reading this correctly, looks like a max of 1GBPS, though it's not clear if that's a per port stat or a total possible on the card.
http://www.qlogic.com/Resources/Documents/DataSheets/Adapters/QLE406xC_datasheet.pdf
Will see if there is any load balancing going on at all or if all traffic is going down one port.
I think they would mention if it could run 2Gbps on 2-ports card, but there is only 1Gbps mentioned for both types of cards.
Time to move to 10Gbps
Indeed, the I/O is going down a single path.
Well to clarify... I do see some minimal amount of I/O activity on two other paths... but the bulk of the I/O is going down one path.
Did you find the cause of the problem? Was the HBA the issue?
We've got 10Gb iSCSI on Compellent series 40 [with jumbo frames enabled at every point in the chain] and experiencing poor disk performance aswell
This wasn’t actually an issue of “disk” performance per se… it was more a matter of the throughput the 1GB iSCSI HBA’s were capable of. The HBA’s were the bottleneck rather than the disks.
Kenny Franklin
Professional Services Manager
Polar Systems, Inc.
21890 Willamette Drive
West Linn, OR 97068
(503) 775-4410 ext. 123
www.polarsystems.com<http://www.polarsystems.com>;
P Please consider the environment before printing this email