VMware Cloud Community
mcwill
Expert
Expert

ESX4 swiscsi MPIO to Equallogic dropping

We've updated to ESX4 and have implemented round robin MPIO to our EQL boxes (we didn't use round robin under 3.5), however I'm seeing 3 - 4 entries per day on the EQL log that indicate a dropped connection. See logs below for EQL & vCenter views on the event.

EQL Log Entry

INFO 10/06/09 23:50:32 EQL-Array-1

iSCSI session to target '192.168.2.240:3260, iqn.2001-05.com.equallogic:0-8a0906-bc6459001-cf60002a3a648493-vm-exchange' from initiator '192.168.2.111:58281, iqn.1998-01.com.vmware:esxborga-2b57cd4e' was closed.

iSCSI initiator connection failure.

Connection was closed by peer.

vCenter Event

Lost path redundancy to storage device naa.6090a018005964bc9384643a2a0060cf.

Path vmhba34:C1:T3:L0 is down. Affected datastores: "VM_Exchange".

warning

6/10/2009 11:54:47 PM

I'm aware the the EQL box will shuffle connections from time to time, but these appear in the logs as follows, (although vCenter will still display a Lost path redunancy event.)

INFO 10/06/09 23:54:47 EQL-Array-1

iSCSI session to target '192.168.2.245:3260, iqn.2001-05.com.equallogic:0-8a0906-bc6459001-cf60002a3a648493-vm-exchange' from initiator '192.168.2.126:59880, iqn.1998-01.com.vmware:esxborgb-6d1c1540' was closed.

Load balancing request was received on the array.

Should we be concerned or is it now normal operations for the ESX iscsi initiator to drop and re-establish connections?

0 Kudos
179 Replies
johnz333
Contributor
Contributor

This server is a new build with no VM's so we have had no data loss but during our tests I have many failures just trying to create a data store on the SAN. If I switch to one nic/path everything is lightning fast, with two nics MPIO round robin the whole ESX server is dog slow and SAN connections drop in/out. I notice FCS errors on our switch for the ports the SAN uses during this time but only when the ESX server is thrashing them. I have two other servers using this SAN for another volume (non vm) and its been rock solid with no FCS errors. Not sure why this is happening, I am going to open a call to either VMWare or Dell next week to get to the bottom of it. many users online have this setup with no problems but I can't even get a test server up before we go production. All settings on switch have been verified: Jumbo Frames/Flow Control/No Sapnning tree. All cables into one switch (Nortel 5698).

John

0 Kudos
johnz333
Contributor
Contributor

For anyone who does find this thread we determined that the issue is with our network switch. When we use 1500 MTU on the initiator everything works fine. 9000 MTU causes connectivity issues. Jumbo frames is enabled on our switch. If our SAN and clients are placed on the production VLAN 9000 MTU works fine, when we move everything back to the SAN VLAN problems arise. We are still troubleshooting this issue with Equallogic support but it looks like it will point to Nortel 5698 switch settings. Jumbo frames is activated by the whole swicth not by VLAN so its cofusing......

0 Kudos
Riku100
Contributor
Contributor


0 Kudos
tWiZzLeR
Enthusiast
Enthusiast

I have a ticket open with Dell/EqualLogic support for the exact same issue. I was told that this is a known issue and that VMware is supposed to be working on a fix for it. However, they would not give me a timetable.

Also, why is this thread marked as answered??? It is most definitely not resolved and should be kept open!

0 Kudos
johnz333
Contributor
Contributor

My issue is more with the switch and Jumbo frames. I was able to connect a workstation using an iSCSI initiator and had the same issue with dropped connections. VMWare not even in the picture. My failure is pretty solid though not interm.

I do have the occasional drop with VmWare so I am guessing this is the issue they are working on for you. Please post if they resolve it.

Thanks.

0 Kudos
s1xth
VMware Employee
VMware Employee

I am just curious, I just purchased a ps4000 and I have bene doing tons of research on how to correctly configure the vswitches for use on the ps4000. Could someone post a screenshot of their vswitch configuration for their iscsi network?

Thanks!!

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
Riku100
Contributor
Contributor

Hei

Olen lomalla 30.10. - 16.11. välisen ajan.

I'm on vacation 30 Oct - 16 Nov.

- Riku

0 Kudos
s_buerger
Contributor
Contributor

there is a guide and a video from dell/equallogic how to configure iscsi with vsphere and equallogic

http://www.equallogic.com/resourcecenter/assetview.aspx?id=8453

http://www.delltechcenter.com/page/A%E2%80%9CMultivendorPost%E2%80%9DonusingiSCSIwithVMwarevSphere should also be interesting.

0 Kudos
s1xth
VMware Employee
VMware Employee

I dont want jack this thread with my different questions. I have a separate post on my iSCSI setup questions. Thanks for the videos though...I ran through them and dont completely answer my iscsi setup questions with the EQL.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
jgeiser
Contributor
Contributor

Well...

Update 1 didn't fix the problem.

0 Kudos
Riku100
Contributor
Contributor

Hei

Olen koulutuksessa 25.11. ja 26.11. joten luen postejani epäsäännöllisesti.

- Riku

0 Kudos
s1xth
VMware Employee
VMware Employee

I am going to hopefully be firing up my eql array over the holiday

weekend so I will let you guys know if I see the same problems.

(PS4000x / 2x5424 switches / separate network / esxi 4.0 u1 / r710 /

2x intel pro 1000pt nics)

Jonathan

Sent from my iPhone

On Nov 24, 2009, at 11:21 AM, jgeiser <communities-emailer@vmware.com

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
tawatson
Contributor
Contributor

Hello!

I was wondering if anyone had any success with figuring out this issue? I am experiencing the same issue. I am using a pair of VMKernel ports shared between 2 physical nic's. (Dell R710 server) I am connecting to a ps4000 storage array. It is interesting, because I am also connecting the server to a LeftHand SAN on the same vmhba, and I have not seen any issues with connectivity to the LH. But each of my 4 servers seem to randomly drop paths to the ps4000.

I also will probably be opening a ticket in the next day or so, and will update if I get any info.

Thanks!

Andrew Watson

Sr. Systems Administrator

The Colorado College

0 Kudos
johnz333
Contributor
Contributor

Just to update for anyone's benefit. With my Jumbo frames issue:We have an original rev of Nortel's 5698 switch, it turns out this early rev of hardware ,not code but the hardware, didn't play nice with our ps100e san. When we used Nortel's 5520 or a newer rev 5698 the Jumbo frame issue is fixed. I still have interm. dropping of the paths even after all this. I am very interested in its resolution. Thank you for the future post with a fix.....

John Z.

0 Kudos
jgeiser
Contributor
Contributor

When you contact VMWare support reference PS484220.

Equallogic seems to think it's an issue with multiple nics on the same subnet.

All I can say, from my testing, is that the Microsoft iSCSI initiator doesn't have an issue on the same hardware.

R710 -> Broadcom 5709 -> Cisco 3750G -> PS6000

It's inexcusable that this wasn't fixed months ago.

0 Kudos
s1xth
VMware Employee
VMware Employee

Edit-

iqn.2001-05.com.equallogic:0-8a0906-7325b9d04-168000cfdd44b155-esxvol1' from initiator '10.10.5.18:55563, iqn.1998-01.com.vmware:vh3psrv3-1903c5bc' was closed. iSCSI initiator connection failure. Connection was closed by peer.

Happening to me too...FRESH setup...brand new everything. Using Intel nics PT1000's, PS4000x, 2x5424 switches, jumbo frames, 4GB LAG between switches.

Guess I will open a ticket with EQL.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
tawatson
Contributor
Contributor

Yes, I am using 2 of the 4 on-board Broadcom NIC's. That would be very weird if that was the issue, but i have seen stranger things Smiley Happy I'll see if I can dig up an extra Intel NIC to test with, and let you know.

Thanks!

Andrew

0 Kudos
johnz333
Contributor
Contributor

I only use the internal Broadcom's for console mgmt, I have two dual port Intel Pro nic's one port on each card for Production VM and 1 on each for iSCSI. I did setup one single Intel NIC with two vmknic's and I still experienced drops. Currently have two physical ports with 3 virt nics bound for 6 paths. Always drops one path usually the highest ip but not always. No effect noticed on performance that I can see yet. But I get 2 drops per day consistently. Jumbo frames or not I get drops. I have two identical servers one setup jumbo one not both drop equally from SAN.

John Z

From: s1xth <communities-emailer@vmware.com>

To: <jzolnows@slcr.wnyric.org>

Date: 12/09/2009 07:49 PM

Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"

0 Kudos
s1xth
VMware Employee
VMware Employee

John/Andrew...

Have either of you opened a ticket yet with EQL??

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
johnz333
Contributor
Contributor

I opened a call to troubleshoot the Jumbo Frames issue which a open call with Nortel has fixed. As soon as they replace my switches and I am back up I will open another call with Dell EQL. I was hoping an update from VMWare was going to turn up. I was also hoping Dell would release its beta plug in for MPIO as I have Enterprise Plus license for VM. As soon as I have my new 5698's and I am satified that the Jumbo issue is fixed I will start a call with EQL and maybe by then the MPIO plug in will be set.....

John Z.

0 Kudos