vSphere + PowerConnect 62xx + EQL

HansdeJongh · ‎02-20-2011

Hello,

we have a cluster of eql's which all give errors on the ETH1.

The EQL's are connected to a series of dell servers with intel nic's through a stack of PC62xx dell switches.

The vSphere hosts are configured with the setup script of EQL in combination with their mpio module.

The switches are on firmware version 3.2.1.3

The eql`s are on firmware version 5.0.2

vSphere is version 4.1 update 1

There are some vm`s which also have a connection to the san through the ms iscsi initiator

On the dell switches flow control is enabled aswell as jumbo frames.

Dell already exchanged some of the switches but that didnt help.

All cables have recently changed.

I have no idea where to go with this.

Somebody gave me the tip to look at flow control on the esx hosts and within the vm`s

Now i know where to find it within the vm for flow control but have no idea if that can be configured on virtual switches..

regards

Hans

depping · ‎02-22-2011

What kind of error? Did you call VMware Support yet?

Duncan (VCDX)

Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive

AndreTheGiant · ‎02-22-2011

As written by Duncan which error?

You mean the eth1 of Equallogic?

You see the error from the Equallogic management interface or from vSphere client?

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro

HansdeJongh · ‎02-22-2011

In the eql group array i`m getting erros,(only on eth1) and in sanhq(eql software) i can see that i have 0,1 retransmit errors...

I guess i cant see the errors in vmware..

Atm Dell is handling the case but they seem to have hit a wall aswell..

AndreTheGiant · ‎02-22-2011

Retrasmission or packet drop?

I suggest to download the documentation on how configure vSphere from Equallogic site.

But to find where is the problem, start with a single switch, with all iSCSI on it.

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro

HansdeJongh · ‎02-22-2011

i have both...

i followed the documentation, used the scripts from eql to auto configure it. I`m certain its not in my configuration.

Atleast thats what dell told me aswell..

I think it is an eql software issue....

AndreTheGiant · ‎02-22-2011

In any case Dell must solve the issue, cause seems related to the storage or the switches.

The error are in the interface view of the member? You have a red number near the interface? Note that this counter is never reset, so maybe you see old error.

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro

HansdeJongh · ‎02-22-2011

a complete new box started to have errors right away, the old boxes where factory resetted so all counters also resetted.. (but yeah it is in the group administration page next to the interfaces (orange..)

jose_maria_gonz · ‎02-22-2011

Hi Hansde,

We had a similar error with Dell Servers, EqualLogic and Dell PowerConnect 3xxxx series in a customer site.

Not sure what went wrong now since it was long time ago (2 months ) but we open a ticket with Dell and VMware and we solved the problem.

It appears that there was an issue with the EqualLogic flare we were using at that time. We upgraded the flare and that sorted the problem.

I am sorry of not being of much help. I just thought you would like to know it.

Jose Maria Gonzalez

Web: http://www.jmgvirtualconsulting.com

Blog: http://www.josemariagonzalez.es

Twitter: http://twitter.com/jose_m_gonzalez

Author of 101 Secretos de VMware vSphere

VCI, VCP4, VCP3, VCP2, RHCE, MCSE, vExpert 2009, 2010

HansdeJongh · ‎02-22-2011

sorry for my stupidty, but what do you mean with flare? firmware?

AndreTheGiant · ‎02-22-2011

You are right... previus firmware version had some issue in vSphere (but were more related to path loss).

But in this case the firmware version is 5.0.2, so it was correctly update.

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro

HansdeJongh · ‎02-22-2011

atm i`m doing a storage vmotion from one lun to another, as i`m using hardware acceleration everything should only go between the boxes..

I can see the errors going up in the eql group administration

jose_maria_gonz · ‎02-22-2011

Not at all HansdeJongh,

The flare is the OS that is running within the EqualLogic and makes the whole thing work. And as for windows, linux or other OSs sometimes we must update or parched it.

Jose Maria Gonzalez

Web: http://www.jmgvirtualconsulting.com

Blog: http://www.josemariagonzalez.es

Twitter: http://twitter.com/jose_m_gonzalez

Author of 101 Secretos de VMware vSphere

VCI, VCP4, VCP3, VCP2, RHCE, MCSE, vExpert 2009, 2010

AndreTheGiant · ‎02-22-2011

On Dell Equallogic there is generic firmware running on a FreeBDS embedded OS.

Flare is the firmware specific of a EMC Clariion (CLARiiON Fibre Logic Array Runtime Environment), like a EMC CX or AX series.

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro

HansdeJongh · ‎02-22-2011

yeah we had the same issues with lost path`s, only a server reboot could solve that.

As you said, we are running the latest firmware...

bigfoot19421 · ‎02-23-2011

can you disable the interface to check wether or not the interface is broken?

(and having the interface disabled prease check if the interface ip is still available- i have seen ip conflicts cause similar problems)

HansdeJongh · ‎02-23-2011

Thanks, but

its on 3 array's that means 6 controllers which all have the problem. So i dont see how will help...

admin · ‎03-09-2011

Hi,

I have gone through your comments and this look like more of a storage array (DELL Equallogic) issue and not releated to VMware ESX. What kind of error message you are seeing? Does that affect any of the function on ESX against the datastore from this array?

You also mentioned that you have enabled hardware acceleration (VAAI) on the storage array. Please confirm that you are running a VAAI supported firmware version on this storage array.

~Sudhish

IRIX201110141 · ‎03-21-2011

Some EQLs reporting Packet Errors when they connected to (Broadcom based) PowerConnect switches and when the ESX VMKs configurated for using jumbo frames (MTU=9000).

We have a mix environment with PS6010XV (10GbE) and PS4000X (1GbE). If the PS4000 is connected to a PC8024F by using one of the copper based 10000Base-T the ESX hosts went totally mad and the Ethernet Ports reporting Packet errors and the yellow warning pops up. The problem went away when the unit is connected to a PC6248 OR when using standard MTU of 1500 bytes per frame.

Dell have confirmed that there is a problem... but they havent decided yet on which site is the root of the evil (broadcom or EQL). So stay tuned.......

Regards

Joerg

HansdeJongh · ‎03-21-2011

hi,

thanks, do you have errors on both array`s?

Which version of the firmware are you running?

I started to get the errors after i upgraded to 5.