VMware Cloud Community
miztadux
Contributor
Contributor

Troubleshooting Software iSCSI performance

Hello,


We are experiencing difficulties with our View infrastructure: users suffers from slow response times.
The central issue seems to be disk performance, basically file copies/transfers can't exceed 20-30 MB/s.

This is a 4 servers (hp dl385) ESX 4.1 cluster, using a NetApp FAS2050 SAS SAN connected via vmware software iscsi through a dedicated 1Gbps network.
I can't understand the results of basic IOmeter tests (based on the famous OpenPermformance.icf), perhaps someone can point me in the right direction...

I used a sequential read test, with 32KB block size; what puzzled me is the per thread results:
- 1 thread  => 25MB/s
- 4 threads => 95MB/s
- 8 threads => 105MB/s
So basically the storage setup can use up all the gigabit link, but not on a single thread...this results in most real life file operations being capped at 25 MB/s...

Anybody ever experienced something like this ?
What test can I run to pinpoint the problem ?

Thanks.
0 Kudos
4 Replies
chriswahl
Virtuoso
Virtuoso

I'd check the size of your vSphere HBA and NetApp LUN queue. It may be set to a size that only allows that much IO to be active in the queue. I believe the recommended LUN queue depth for NetApp is 64.

Also check out NetApp TR-3749: http://www.netapp.com/us/system/pdf-reader.aspx?m=tr-3749.pdf&cc=us

VCDX #104 (DCV, NV) ஃ WahlNetwork.com ஃ @ChrisWahl ஃ Author, Networking for VMware Administrators
miztadux
Contributor
Contributor

Thank you for your answer, it's not my field of expertise and I sure need a hand.
The disk queue length seems to be a good indicator of the problem, unfortunatly I'm not familiar with the related settings...

Anyway I tried monitoring the various counters available through esxtop, from what I gather the queue lengths seemed OK.
(the software hba's is not configurable and reported as AQLEN=1024, the vmkernel's was setup as dynamicallay throttled and reported as DQLEN=32 or 128 depending on load; I tried setting it up to fixed/64 but it didn't change the results)
But what seemed wrong is that the number of active commands in the queue was always ACTV=0, 1 or 2.
In the VM (windows XP) perfmon also reportd a queue depth always <= 1 when copying files.

Heres an example, esxtop in "u" mode, with a windows VM copying a file (worst case scenario, on avg ACTV is 0 or 1):

DEVICE             PATH/WORLD/PARTITION DQLEN WQLEN ACTV QUED %USD  LOAD   CMDS/s  READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd
naa.60a98000503xx           -             128     -    2    0    1  0.02   638.96   630.38     8.58    19.21     0.39     1.51     0.01     1.51     0.00

The same thing, but during an iometer run with "# of Outstanding IO = 8":
DEVICE             PATH/WORLD/PARTITION DQLEN WQLEN ACTV QUED %USD  LOAD   CMDS/s  READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd
naa.60a98000503xx           -             128     -    8    0    6  0.06  3750.80  3705.50    45.30   112.86     0.45     2.03     0.00     2.04     0.00


So now I'm wondering why the queue stays empty and the transfer rate so low...testing on a physical windows showed the queued length going up to 32 as soon as a file copy is started...

PS: also thanks for the NetApp link, It's so obvious I never thought about it...

0 Kudos
vangoose
Contributor
Contributor

You can install NetApp VSC (Virtual Storage Console) on vCenter and use that to set all the options.

How is your view provisioned? Linked Clone?

0 Kudos
miztadux
Contributor
Contributor

Hi,

I installed the NetApp plugin some days ago (after reading about it in the document mentioned in the previous reply) and used it to apply the recommended settings.

But it didn"t change anything, as I had already configured the adapters accordingly...

As for the provisionning, with use basic "Full Clones" (full copy of the images).

As a sidenote, I think I found the answer to the "queue size is never more than 1" problem:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1614

It's a problem with Windows and the BusLogic controller, Qdepth is limited to 1.

But anyway the performance problem persists with the LSI logic controller or other OSes: this setup performs maginally better (and qdepth is sometimes > 1) but still very far from the full gigabit...

0 Kudos