Hello,
I have multiple dell poweredge r610/r620 connected to a stack of 4 powerconnects 6224/6248 to 4 equallogic's (PS5000E + PS4000E 2X) + PS6100XV).
the PS5000E is in pool 1
the 2 PS4000E's are in pool 2 (holds 2 datastores, all on 7 esx hosts)
the PS6100XV are in pool 3. (holds 7 datastores, all on 7 esx hosts)
I have test it both to pool2 and pool3. The most problems are on pool 3. but it also happens on pool 2.
Each esxhost is configured like equallogic wants it. 2 nics dedicated for iscsi traffic purely for vmfs..
VMnic's are bound correctly... Dont use those nics for anything else..
Windows guest vms also have access to the san through the microsoft iscsi initiator and 2 seperate vmnic's on all esxhosts.
Windows guest iscsi traffic does NOT experience the same problem.
The problem is easily reproduced: when i do a random or seq read io`s from io meter with a 64KB io size i get a latency of 140+ and a throughput of 28MB...)
I have been working for more then a week now on this case and i have as far as i know tried everything.
I have created a document in which i have putted all my brain waves and results in it.:
Problems:
2 paths slow
Throughput per path is 14MB and latency = 141ms per path (230 iops)
If I disable one path throughput stays the same but latency goes up to 300MS
Question: why does the latency double when disabling one path?
1 path slow and one path fast
First path:
1600 iops / 100mb per sec / 19ms latency
Second path
30 iops / 14mb per sec / 142ms latency
Question: why is the latency on the bad path only 142ms and not 300?
2 paths fast
Both paths. 1600+ iops / 100mb per sec / 17 ms per path
IO size comparison: (see pictures and xlsx sheet.)
I have test this but on the virtual switch (port group in prom mode) and by mirroring both the uplink of the esx host and uplink of the san.
Test
I have reset all counters on the powerconnect switch. (9:35 CET)
On the ESX09 the time is 07:39 UTC 2012.
Then I started a test turn its running on datastore “ESX-SAS-03”.
On hour later, I don’t see any dropped frames on the used interfaces. Neither do I see any pause frames being send or received. Neither are there any warning or errors on the EQL or vmware.
There were no other vm’s running on the host at that time.
please help:( cause dell/ equallogic support isnt taking me seriously (they blame my switches because i route iscsi traffic between vlan's and have acl's. But that iscsi traffic is not effected). they tell me to break down the stack.) They didnt even talk to use. Everything happend through email. There was no webex what so ever...
Regards
Hans
this is what wireshark sees when i look at the scsi.request_frame
Something to try is make sure that DelayedACK and Large Recieve Offload are disabled.
Here's a VMware KB on howto disable Delayed ACK
Disabling LRO:
S
olution Title | HOWTO: Disable Large Receive Offload (LRO) in ESX v4/v5 | ||
Solution Details | Within VMware, the following command will query the current LRO value. # esxcfg-advcfg -g /Net/TcpipDefLROEnabled To set the LRO value to zero (disabled): # esxcfg-advcfg -s 0 /Net/TcpipDefLROEnabled NOTE: a server reboot is required. Info on changing LRO in the Guest network. http://docwiki.cisco.com/wiki/Disable_LRO |
What version of ESX? What's the build #?
What MPIO pathing are you using? I.e. Fixed, Round Robin, Dell MEM? (If MEM, what version?)
If you are using Round Robin then you probably are using the default IOs per path value of 1000. That should be changed to 3. Depending on which version of ESX I have a script that will change that for you.
This link has some agreed upon suggestions about iSCSI settings.
Hans,
Have you found the answer to this issue?? I have pretty much the same issue. Very low read iops and extremely high latency. Very frustrating!! Have you talked with Dell to get any answers? What do you see when you look at ESXTOP? Do you see any issues there?