VMware Cloud Community
_VR_
Contributor
Contributor

Equallogic Performance Issues

A CALL FOR HELP

I've spent a week trying to troubleshoot an issue with a new Equallogic PS4100X. A case has been opened with Dell a week ago. After multiple escalations it has gotten absolutely nowhere. I wanted to see if anyone would be able to add some insight.

IOMeter test result:

SERVER TYPE: Windows 2008 R2
HOST TYPE: DL380 G7, 72GB RAM; 2x XEON E5649 2.53 GHz 6-Core
SAN Type: Equallogic PS4100X / Disks: 600GB 10k SAS / RAID LEVEL: Raid50 / 22 Disks / iSCSI
##################################################################################
TEST NAME--Av. Resp. Time ms--Av. IOs/sek---Av. MB/sek----
##################################################################################
Max Throughput-100%Read.......______18___..........___3217__........___101____
RealLife-60%Rand-65%Read..._____13___.........._____3438__........_____27____
Max Throughput-50%Read.........______19___..........____3199__........___100____
Random-8k-70%Read................_____13___.........._____3463__........_____27____

DESCRIPTION OF PROBLEM:

The PS4100X has a system bottleneck that limits throughput to 100MB/s. When a single host is connected with a single path, eth0 and eth1 on the PS4100x can max out at 1Gbit/s. When there are multiple hosts or multiple paths connected (tested 2 - 8 concurrent paths, 2-6 host nics), the throughput of eth0 and eth1 drop to half of the speed (500Mbit/s). The combined throughput of both ethernet adapters can never exceed 1Gbit/s. Unit has been upgraded to v5.2.1 (latest) firmware.

SEE TEST RESULTS HERE:

1. Shows eth1 being maxed out in single path, then the connection switches to multipath
2. Shows eth0 being maxed out in single path, then the connection switches to multipath
3. Shows two concurrent tests from two separate test hosts

RULLING OUT NETWORK ISSUES:

I'm able to replicate the above problem in the following configurations:
Test host connected to PS4100X via Cisco 6509
Test host connected to PS4100X directly via cross over cable (two active iscsi paths setup manually)
Test host connected to PS4100X via dedicated unmanaged netgear switch
I can further prove that the Cisco 6509 is functioning properly because I'm able to show speeds of 180MB/s+ speeds to the production PS6000XV and the production PS4000E.

RULLING OUT HOST ISSUES:

Tested from a host running Windows 2008 R2 and another host running Windows 2003. Both test hosts encounter the issue described above. Both hosts show speeds of 180MB/s+ when running tests against the two Equallogics in production.

DEALING WITH DELL-EQUALLOGIC SUPPORT HELL:

The analyst I'm currently dealing with says the PS4100x is working as expected. He refuses to do any further troubleshooting because some of the blades on the Cisco 6509 have QOS and VOIP. The blade the SAN and test hosts are connected to have no QOS or VOIP configured.

56 Replies
_VR_
Contributor
Contributor

Jonathan,

What type of switches are you using with your EQL setup?

In production I have Cisco 6509s. For the purpose of troubleshooting I moved the EQL and test hosts to a single blade. Also for the purpose of troubleshooting I ran tests on an unmanaged Netgear switch and direct connect crossover (no switch).

What type of NICs on the ESX hosts? Broadcom or Intel?

The onbord NICs are intel based (4-ports) and the additional nic is broadcom based (4-ports).

Reply
0 Kudos
davidbewernick
Contributor
Contributor

Hmm... got no idea at the moment. Sound like an issue on the EqualLogic side for me but I don´t know them very much. Sorry.

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

I see that you are using 5.1 firmware.

How much free space do you have on the member? On this firmware (see firmware notes) you need at lest 100 GB of free space!

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
_VR_
Contributor
Contributor

This issue has everyone at EQL stumped. Replacing the controller did not resolve the issue. Engineering is looking into it.

Reply
0 Kudos
_VR_
Contributor
Contributor

Its a brand new device with about 11TB free. Firmware was upgraded to v5.2.1.

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

I mean how much free space.

With the new firmware you must least at least 100 GB of free space.

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
_VR_
Contributor
Contributor

11 TB free space. They're shipping me a replacement unit.

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

In effect is really strange. Which kind of switches are you using?

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
_VR_
Contributor
Contributor

I have Cisco 6509s in production. Also tested with an unmanaged Netgear and a direct connection (using cross overs)

Equallogic also sent me a new Dell 6224 for testing

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

ok, for sure the issue isn't in the switches Smiley Happy

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
jsabbott25
Contributor
Contributor

Did you get better results with the replacement unit?  I just went through pre-production testing and I am currently migrating our infrastructure over to a PS4100X and a PS4100XV.  I didn't see any kind of issues like this during any of my testing.

Jake

Reply
0 Kudos
w_estermann
Contributor
Contributor

Have you managed to get any further with your diagnostics? Any chance you can look at my below config and comment on your setup / results?

Current Setup:

- 5 * Dell R710 servers

- 1 * dual port 10GB Broadcom 57711 NIC in each server

- 2 * EqualLogic PS6510E (firmware 5.2.2) configured in a single group with all disks running RAID 10.

- ESXi 5.0 Update 1 - 623860

- each nic is bound to a iscis port group as per forum and Dell suggestions

- Am using Dell MEM driver for advanced multipathing and load balancing

- 45TB total, 1 x 2TB LUN created and hosting the 'IO ANALZYER' machine only.

- no thin disks currently being used.

Have beein using VM Labs IO analyzer to compare results, test being 'Max_Throughput.icf'

Based on everyones experience, what is preferred options currently being used with best performance:

- hardware or software ISCSI?

- Jumbo frams off or on?

- Delayed Ack enabled or disabled?

- storage IO control enabled or disabled?

- what range of IOPS and MBps would you expect to see?

My tests are being very random and not 100% sure how to understand the results. Currently I am seeing:

- software iSCSI on, jumbo frames on, delayed Ack on = 952 IOPS / 472 MBReads/S

- software iSCSI on, jumbo frames off, delayed Ack on = 1169 IOPS / 579 MBReads/S

- software iSCSI on, jumbo frames on, delayed Ack off = 1505 IOPS / 747 MBReads/S

- software iSCSI on, jumbo frames off, delayed Ack off = 181 IOPS / 90 MBReads/S

Is my third test the best results I could expect?

How does that compare with the rest of the community?

Is there any other settings I should be using / changing to improve my results.

Worry is that these results are based on only 1 test machine being run. When I run all 5 test machines (1 on each host) the overall performance drops VERY low.

Thanks in advance for any help / replies

Warren Estermann

Reply
0 Kudos
Ramzy201110141
Enthusiast
Enthusiast

Having the same issue Smiley Sad

I have two groups installed in two different location each group have 3 * ps6000 array members with firmware 5.2.1

appreciate if you can update me with any suggested solution

thanks,

Ramzy

Reply
0 Kudos
_VR_
Contributor
Contributor

Update:

The L3 tech I was working with ordered a brand new PS4100X and had it shipped to his lab. In his lab environment he saw full 200MB/s throughput.

I saw the same poor performance when I configured the replacement unit in my environment.

Long story short, I got a replacement unit but the issue was not resolved.

Reply
0 Kudos
_VR_
Contributor
Contributor

w_estermann,

Have you tried running all four tests?

http://www.mez.co.uk/OpenPerformanceTest.icf

RealLife and Random will give you a better idea of how your system will perform in "Real Life"

What block size are you using for your tests? It doesn't look like the 32K block used in OpenPerformance.

1505 IOPS / 747 MBReads/S

If you are in fact getting 747MB/s thats not bad at all.

Try searching for similar results in the Open unofficial storage performance thread

Reply
0 Kudos
Ramzy201110141
Enthusiast
Enthusiast

thats too bad!!!

I will open a case next week and will keep you posted if i reach anything

Many thanks for the update

Reply
0 Kudos
_VR_
Contributor
Contributor

Ramzy201110141,

What issue are you seeing? Can you post your OpenPerformanceTest results? Did these issues develop after a firmware upgrade?

Reply
0 Kudos
alex555550
Enthusiast
Enthusiast

Hi,

has anybody a solution or update. Same Problem with ESXI 5 U1 and Cisco.

Reply
0 Kudos
_VR_
Contributor
Contributor

I encountered a different problem when upgrading from ESX 4.1 to ESXi 5 U1.

RealLife,Max Throughput-50% and Random tests all showed expected results. Max Throughput-100% on the other hand had horrible results. After a lot of troubleshooting I indentified the Intel quad nic as the culprit.

The following fix resolved the issue:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=201889...

esxcfg-module -s "InterruptThrottleRate=0,0,0,0" igb
Reply
0 Kudos