VMware Cloud Community
mcwill
Expert
Expert

ESX4 swiscsi MPIO to Equallogic dropping

We've updated to ESX4 and have implemented round robin MPIO to our EQL boxes (we didn't use round robin under 3.5), however I'm seeing 3 - 4 entries per day on the EQL log that indicate a dropped connection. See logs below for EQL & vCenter views on the event.

EQL Log Entry

INFO 10/06/09 23:50:32 EQL-Array-1

iSCSI session to target '192.168.2.240:3260, iqn.2001-05.com.equallogic:0-8a0906-bc6459001-cf60002a3a648493-vm-exchange' from initiator '192.168.2.111:58281, iqn.1998-01.com.vmware:esxborga-2b57cd4e' was closed.

iSCSI initiator connection failure.

Connection was closed by peer.

vCenter Event

Lost path redundancy to storage device naa.6090a018005964bc9384643a2a0060cf.

Path vmhba34:C1:T3:L0 is down. Affected datastores: "VM_Exchange".

warning

6/10/2009 11:54:47 PM

I'm aware the the EQL box will shuffle connections from time to time, but these appear in the logs as follows, (although vCenter will still display a Lost path redunancy event.)

INFO 10/06/09 23:54:47 EQL-Array-1

iSCSI session to target '192.168.2.245:3260, iqn.2001-05.com.equallogic:0-8a0906-bc6459001-cf60002a3a648493-vm-exchange' from initiator '192.168.2.126:59880, iqn.1998-01.com.vmware:esxborgb-6d1c1540' was closed.

Load balancing request was received on the array.

Should we be concerned or is it now normal operations for the ESX iscsi initiator to drop and re-establish connections?

0 Kudos
179 Replies
s1xth
VMware Employee
VMware Employee

I reconfigured two of my test servers that connected to a PS4000 yesterday afternoon and set them up with two separate vSwitches and two vmK ports per vSwitch. So far I have not seen any drops in RR/MPIO under my EQL event log and the iSCSI connection times confirm this. I will continue to monitor.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
johnz333
Contributor
Contributor

Thanks for the update. I have been considering this option. I have three hosts and 12 vm's in production under the original Dell document. I only see one or two drops during the wee hours of the morning and no degradation or data loss noticed yet. I have these is an HA cluster so I am not to keen on changing the config on all three just yet. If you have no issues over the next few weeks I may consider doing this. We are a K-12 school so I have a week coming up that I could get this done. I was hoping that the patch would be out before I needed to consider this. I would definitely keep my 3:1 but have each physical nic on a different vswitch as suggested. has anyone tried this new config 3:1?

John Z

From: s1xth <communities-emailer@vmware.com>

To: <jzolnows@slcr.wnyric.org>

Date: 01/20/2010 01:29 PM

Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"

0 Kudos
s1xth
VMware Employee
VMware Employee

Johnz-

I actually have these two hosts in an HA setup also, running a couple production VM's. I did one server at a time to make sure there were no issues. I just added another vmK port to each vSwitch for 3 vmk's to 1 vSwitch to see if there are any connection drops for a total of 6 active i/o paths. I will keep you posted on my results through the week after I perform more tests and monitoring.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
johnz333
Contributor
Contributor

Thank you! That is awesome.

John Z

From: s1xth <communities-emailer@vmware.com>

To: <jzolnows@slcr.wnyric.org>

Date: 01/20/2010 02:23 PM

Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"

0 Kudos
tWiZzLeR
Enthusiast
Enthusiast

s1xth - Is the limition by Dell/EqualLogic six active paths per volume on the SAN or six active paths from each ESX host?

I currently have 3 ESX hosts in a HA environment with each host having two VMkernel ports in RR for a total of 6 active paths.

0 Kudos
johnz333
Contributor
Contributor

tWiZzLeR,

I have your setup now. According to the Dell setup doc 8 was a limit from the VM side. I want to say that the limit on the EQL side was 256/lun. I have 20 connections to one lun on my box now: 1 Backup (VCB), 18 VM (6 paths from each host) and a 1 older VM (ESX4) host with only one nic for iSCSI.

John Z

From: tWiZzLeR <communities-emailer@vmware.com>

To: <jzolnows@slcr.wnyric.org>

Date: 01/20/2010 03:04 PM

Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"

0 Kudos
dwilliam62
Enthusiast
Enthusiast

What limitation are you referring to? The array has a limit of 512 connctions per pool up to 2048 connections using 4x pools. You can add more paths to the storage. However at some point you're not going to go any faster.

Are you talking about the VMware Round-Robin or the Dell/EQL MPIO beta?

-don

0 Kudos
tWiZzLeR
Enthusiast
Enthusiast

What limitation are you referring to? The array has a limit of 512 connctions per pool up to 2048 connections using 4x pools. You can add more paths to the storage. However at some point you're not going to go any faster.

Are you talking about the VMware Round-Robin or the Dell/EQL MPIO beta?

-don

VMware Round Robin. On page 9 of Dell's vSphere Configuration Guide it states "VMware vCenter has a maximum of 8 connections to a single volume". Am I misreading this or what does the 8 connections mean?

I attached the guide.

Message was edited by: tWiZzLeR

0 Kudos
dwilliam62
Enthusiast
Enthusiast

That's a VMware limit, not an array one. All iSCSI vendors have this limit, since it's with the OS. See Page 4 on the attached PDF from VMware.

-don

0 Kudos
s1xth
VMware Employee
VMware Employee

It would be very hard to even use more then 8 connections unless you have a huge array setup...so I never looked at this as a 'limitation' espically with the advancements of 10GB ethernet.

I am now at 1 day + and no disconnects....which is good.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
tWiZzLeR
Enthusiast
Enthusiast

OK, so if 8 active connections is a VMware limit then if I have 3 ESX hosts all accessing the same volume on the SAN then really I can only have 2 VMkernel ports for iSCSI on each host, right? (3 hosts x 2 VMK = 6 connections).

0 Kudos
dwilliam62
Enthusiast
Enthusiast

That's 8 connections per host not total number of connections.

-don

Sent from my iPhone

0 Kudos
tWiZzLeR
Enthusiast
Enthusiast

That's 8 connections per host not total number of connections.

-don

Ahhhh, thanks for the info! I have asked that question before in other threads and did not get a clear answer. So, with my 3 ESX hosts then I can have up to 24 connections to a single shared volume on the SAN (8x3=24). Now that makes more sense!!!

0 Kudos
tWiZzLeR
Enthusiast
Enthusiast

I am currently setup 1:1 as in the picture below. I have two physical NICs in each ESX server for iSCSI traffic andI created one VMkernel port for each vSwitch with Jumbo Frames enabled. I do see a few paths being dropped (maybe once a day) but no actual connectivity loss as I have these setup in RR MPIO and it just fails over to the other path. I can say that since I created a second VMkernel Port, second vSwitch and moved the second nic to that vSwitch that the number of dropped paths has been greatly reduced.

BTW, the reason that I also have a VM Port Group connected to each vSwitch is so that I can also use guest iSCSI access for VMs that have SQL and Exchange installed in them in order to use the Microsoft iSCSI Initiator and Dell's Auto-Snapshot Manager Microsoft Edition (ASMME) for quiesced snapshots.

8280_8280.JPG

0 Kudos
Edificom
Contributor
Contributor

I have 2 hosts, one running as before using multiple Nics to one vSwitch, and the other setup as in the above diagram from tWiZzLeR with 2x vSwitches with 1 nic each.

I will run this in test and see how it goes, and let you guys know what the results are.

0 Kudos
Ian78118
Contributor
Contributor

Its good that this is proving a good work around solution. However I think VMWare should be at least giving us at least an indication of when they expect to release this patch, given that this has been a bug since Vsphere was released.

0 Kudos
s1xth
VMware Employee
VMware Employee

Well after about 4.5 days of monitoring, it seems the connections are slightly more stable in this configuration. I have one vmK port that drops out, but this may be because of low i/o on my array, it still shouldnt happen but its better. So in the end, I am still seeing a drop...but much less frequently.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
Edificom
Contributor
Contributor

To be honest, I can't see much difference between the fix and without. I have very IO across the SAN as it's not in production, maybe that is why, but I'm getting quite a lot of drops everyday.

Running out of patience here as not getting a response from vmware and wondering if I am going to have to downgrade everything to 3.5 just to get this working now.

0 Kudos
johnz333
Contributor
Contributor

I guess after hearing this I will continue to run the way I have mine setup, 3:1 on a single switch. I see one or two drops per night, usually not from the same host. I now have 10 production VM's across three hosts in HA cluster and no degradation in performance or data loss noticed yet......I am using Backup Exec 12.5 with Virtual Infr. Agent using VCB and have full backups to restore...knock on wood...

John Z

From: Edificom <communities-emailer@vmware.com>

To: <jzolnows@slcr.wnyric.org>

Date: 01/25/2010 09:08 AM

Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"

0 Kudos
s1xth
VMware Employee
VMware Employee

I just made a post on my blog about the configuration changes I made. I will be posting my testing results shortly. http://bit.ly/8SYsGD

I agree with the others, this is definitly not a 'fix' or a GOOD 'workaround' but it is what VMware is teling its customers to do. Hopefully we see a patch to this soon...really soon because this is getting out of hand.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos