mikeddib
Enthusiast
Enthusiast

Problem with EqualLogic / MPIO / Failover

Jump to solution

We have been running some tests in our environment with less than ideal results. Our environment consists of 6 vSphere4 hosts in one cluster connected to about 10 VMFS volumes. Each host has a dedicated vSwitch with two pNICs, 3 vmkernel ports per NIC, and each NIC is connected to a different physical switch. We have set the path policy to Round Robin so each volume has 6 active paths listed. On the EqualLogic side, we have 3 members in a group, one pool for that group, and the controllers are connected to / balanced across those same two physical switches (Nexus 5020). This was based off the guide from the EqualLogic site. http://www.equallogic.com/resourcecenter/assetview.aspx?id=8453

Our issue is when we try to test failover / availability by simulating a switch outage. The EqualLogic group sees numerous timeouts / login errors, but still has NICs active on the active controller, so there is no controller failover. The hosts lose one of the two NICs on the storage vSwitch, but even with one NIC up and online all VMFS volumes disappear from every host. When we bring the switch back online, all the VMFS volumes come back without taking any corrective measures, and the VMs appear as if nothing has happened. Trying to use the VMs we see performance degradation until we rescan / clean up the storage communication and then everything returns to normal.

We're at a loss on where to begin. We have followed the recommendations from EqualLogic and of course being the VMware guy I'm pointing at the storage. The hosts still show active paths when this happens and the errors are login timeouts which is the other reason I think it would be good to start at the storage side. Anyone out there with similar configurations who may have run into something similar?

0 Kudos
1 Solution

Accepted Solutions
s1xth
VMware Employee
VMware Employee

+The physical switches are independant. - +Out of everything, this is really jumping out at me. From everything I have worked with in EQL, you need to have some sort of connection between these two switches for load balancing to work correctly. When you say independent is that what you mean? Per the Dell doc it states that either the switches need a LAG/Etherchannel group or stacked to leverage LB correctly across the arrays.... I only have two PS4000's and have them configured with two PC 5424's with a 4GB lag with no issues. The rest of your configuration looks fine, shouldnt have any other issues on the VMware side, this is definitly switch side.

Maybe someone with more high-end Cisco experience can chime in on what kind of configuration you need between the switches. I may have another Dell doc that discusses the required Switch configurations.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi

View solution in original post

0 Kudos
24 Replies
s1xth
VMware Employee
VMware Employee

Well I will see if I can help. I don't have a large EQL setup like you have but have experience setting up Eql and vsphere. You state you have three members in the storage group so I am assuming you have maybe a ps6000 and two ps4000's? Just curious what hw you have. That is one heck of a switch also. How do you have the switches(physical) setup? Stacked, lag, etc? Also, on your vswitches, did you configure your nic adapters correcty on the vsphere side, specifically the binding? In another words you have 3vmks binded to one nic and 3vmks binded the other nic. It sounds like the vmks are binded to both nova instead of 3 vmks to one nic and the other three to the other nic. The vswitch will show one adapter as unused and one used but are both actually used because you the bind the vmks to the iscsi initator. I usually configure all of this through the vMA console. I know you said you looked at the dell doc, not sure if that doc references these specs. You may have done all of this, if you have ignore it!! Just thought it was worth a shot.

Jonathan

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
mikeddib
Enthusiast
Enthusiast

We have three PS5000 series right now. The physical switches are independant. They tie back to a pair of 4900Ms to do L3. On the Nexus 5020 all storage ports are configured on a separate VLAN, as access ports and are not trunked. Those cables / ports only carry storage traffic and since it's all on the same VLAN, nothing should be requiring L3 intervention. We do also have jumbo frames enabled and spanning tree disabled on those ports (portfast).

On configuration in vSphere, an esxcfg-vswitch -l shows...

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch2 64 9 64 9000 vmnic3,vmnic5

PortGroup Name VLAN ID Used Ports Uplinks

iSCSI6 0 1 vmnic5

iSCSI5 0 1 vmnic3

iSCSI4 0 1 vmnic5

iSCSI3 0 1 vmnic3

iSCSI2 0 1 vmnic5

iSCSI1 0 1 vmnic3

Here you see the 6 port groups (each has an IP for vmk) and each vmnic is assigned to 3 port groups, and 3 only as you described. Also, here is a modified esxcfg-vmknic -l output

Interface Port Group/DVPort IP MTU

vmk0 VMotion IPv4 172.24.18.21 1500

vmk1 iSCSI1 IPv4 172.24.17.40 9000

vmk2 iSCSI2 IPv4 172.24.17.41 9000

vmk3 iSCSI3 IPv4 172.24.17.42 9000

vmk4 iSCSI4 IPv4 172.24.17.43 9000

vmk5 iSCSI5 IPv4 172.24.17.44 9000

vmk6 iSCSI6 IPv4 172.24.17.45 9000

I won't paste it in, but when I run 'esxcli swiscsi nic list -d vmhba33' we do see all the nics appropriately bound the same way it's seen on the vSwitch output above. The Dell doc does give most of thos pieces and also from running these commands again, each nic does show packets sent and received (from that esxcli command). It's bizarre to me why we have this problem. If each nic is working, then we know both switches are working, not just one since I would think if we were sending half our storage packets into a blackhole we would see serious issues all the time, which we're not. It's only when we induce a failure during testing that we see issues. Of course, the point is to make sure it works so if we do have a real switch failure we'll be OK.

0 Kudos
s1xth
VMware Employee
VMware Employee

+The physical switches are independant. - +Out of everything, this is really jumping out at me. From everything I have worked with in EQL, you need to have some sort of connection between these two switches for load balancing to work correctly. When you say independent is that what you mean? Per the Dell doc it states that either the switches need a LAG/Etherchannel group or stacked to leverage LB correctly across the arrays.... I only have two PS4000's and have them configured with two PC 5424's with a 4GB lag with no issues. The rest of your configuration looks fine, shouldnt have any other issues on the VMware side, this is definitly switch side.

Maybe someone with more high-end Cisco experience can chime in on what kind of configuration you need between the switches. I may have another Dell doc that discusses the required Switch configurations.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
mikeddib
Enthusiast
Enthusiast

On the switch side I need to be a bit more clear. We have two Nexus 5020s which we have split connections from the ESX systems, so 2 ports from each ESX host for storage traffic, with port 1 going to 5020-A and port 2 going to 5020-B. Each 5020 has an Etherchannel between itself and two 4900Ms. When we fail 5020-A completely, there are still 1 connection per ESX host and at least 1 connection per EQL member connected to 5020-B, and 5020-B hasn't lost any connections to the upstream switches. That makes me think the 4900Ms aren't really in the picture because all the paths are still there.

We're doing some more controlled testing to try to identify where the breakdown is. If I find anything I will certainly update this thread.

0 Kudos
mikeddib
Enthusiast
Enthusiast

We continued our testing and it seems we are running up against a number of potential problems. We have opened a support case which is still being worked to determine what the true problem is.

The first potential issue we uncovered was consistent with PR #484220, where ESX logs show connections starting and stopping with numerous path changes in the vmkernel log. We then changed our configuration to 1:1 from 3:1 but continued to have problems. That led support to look into PR# 498740, where people who have 4 10GbE ports in a host running with Jumbo Frames configured can run into memory issues during periods of congestion / heavy utilization. Initial attempts at validating that as the issue aren't going well, but it's still being worked and I will post back

0 Kudos
KingFridayXII
Contributor
Contributor

I'm having the exact same issue. Disabling jumbo frames did nothing for me....

0 Kudos
J1mbo
Virtuoso
Virtuoso

I just wanted to pick up on a few of points:

- there needs to be full Layer-2 connectivity between every connection

- each host should be cabled with 1 NIC to each switch

- each PS series array should be be cabled such that each controller has at least one connection to each switch

- there should be high bandwidth ISL between the switches, 4 Gbps min

- all esx sw iscsi vmkernels should be on the same IP subnet (and all interfaces of the PS series arrays)

- any PS series management interfaces on different subnets must be set as dedicated management interfaces

Reading from the OP "EqualLogic group sees numerous timeouts / login errors, but still has NICs active on the active controller, so there is no controller failover" - the controller should not failover when a switch outage occurs, since each controller should have paths on a least one of its interfaces surviving.

Also it's worth downloading SanHQ if you haven't already done so.

HTH

Please award points to any useful answer.

s1xth
VMware Employee
VMware Employee

I agree with everything you stated below except I want to clarify for others the comment on the lag group/isl configuration. If you have a single array (ps4000) then a 2gb lag is plenty. You should have as many active ports as possible available in your isl as you do active on your san controller. Obviously stacking switches are the best as they have plenty of bandwidth and port performance is usually better. If you have a ps4000 x2 than 4gb is needed, most switches can only provide up to 8gb lag and at this point you are just eating away at your port count, and should do stacked switches.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
mikeddib
Enthusiast
Enthusiast

Let me see if I can address these bullet points, and I'll take the easy ones first

- Each host has a 1 NIC cabled to each switch

- Each PS series has been cabled with at least one connection per switch, per EQL doc

- We are a 10GbE environment, connectivity between switches is a minimum of (2) 10GbE ports in a port channel

- We have reduced our config to 1 vmkernel port per host, but even our original config all vmkernel ports were in the same L2 network

- There are no mgmt interfaces on different subnets

I also agree with the point that proper controller cabling prevents controller failover during a switch outage, and that was more an informational point. The top bullet is where the discussion gets interesting and where most of our research has taken us Let me draw a rough diagram

L3

L2 L2

Everything (hosts, storage, etc) is connected redundantly to the bottom two L2 switches. Essentially the L2 storage network is defined on all switches. There is a path through the network, it just needs to take one extra hop. Apparently, and we're running tests to determine the true impact, we've been told that one extra hop significantly impacts performance, even though the uplinks is a port channel with (2) 10GbE ports. We are still testing and we may add that link between the two L2 switches to remove the extra hop, but either way I will post back. The frustration comes from the fact that this configuration was in place for some time without problems.

0 Kudos
J1mbo
Virtuoso
Virtuoso

Could you post up the diagram as an image?

0 Kudos
s1xth
VMware Employee
VMware Employee

The root of this problem is on the VMware side, not Eql. There an NDA in place blocking discussion as to why we are seeing path drops. The fix has been made, a patch is finished and will be released as patch 5 which should be the next release. If you have proper switch configuration you will not have problems. If you are having problems then you have other issues outside this root problem of the original posting/topic.

-- Sent from my Palm Pre

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
mikeddib
Enthusiast
Enthusiast

I'm adding an attachment, before and after to show what the fix seems to have been.

First I would like to say on the VMware side, we removed all vmkernel ports but one, set our pathing policy to fixed, and disabled Jumbo Frames. We still saw issues which would lead me to believe it was possibly a VMware issue before, but not after we made all those changes. Once we made all those VMware changes and still saw issues is when dove in on the network side.

The picture shows to switches doing L3 and those communicate via HSRP to define which one holds the default gateway. Each L2 switch below has a portchannel to each L3 switch above, but no connection between the L2 switches. We ran numerous tests before adding that link and even with just the one vmkernel port and one active NIC on the portgroup, we kept running into volumes going offline and back online. We added the link between the L2 switches and all of sudden we were able to push almost 8000 IOPs from one host with latency between 1-5 ms.

We're definitely in the group of people waiting for the patch to be released. VMware specifically told us not to add multiple vmkernel ports per host until the patch is applied, but that once the patch comes out we should be able to add them back in and begin multipathing again.

0 Kudos
J1mbo
Virtuoso
Virtuoso

From what's been put there, I read that there is L3 routing between ESX and the PS?

Please award points to any useful answer.

0 Kudos
s1xth
VMware Employee
VMware Employee

What 'patch' are you referring to from VMware?

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
mikeddib
Enthusiast
Enthusiast

This is what I received back from VMware support as part of my case...

Also, PR 484220 you've encountered - "ESX host keeps marking Dell EqualLogic connections offline" - is supposed to be fixed in P05 patch.

I was told the tentative date for that is the end of the month.

In regards to the switching / routing, our ESX and PS storage connections are all on the same L2 VLAN, no routing. I was just pointing out the topology, that the hosts and storage are all connected to the Nexus 5020's which do not route and connect back to the upstream switches anytime they need to route off net.

0 Kudos
s1xth
VMware Employee
VMware Employee

Hmmm....I am not sure if that patch will really fix your specific problem. That patch is referenced more in depth in the thread on the forums here about path drops. If your switches are configured correctly then this issue SHOULD NOT be a problem for anyone. Its more noise then anything and yes, VMware needs to fix it but I dont think this is the same problem you are having. Maybe it is the same issue, and you are experiencing the worst end of it in your configuration.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
dfollis
Enthusiast
Enthusiast

I've also had the MPIO issue. Tried to fail a switch and everything went nuts. Also simply unplugged one of the two NICs with similar results. I have 3 ESX v4 hosts patched to U1. I have a PS6000 with each controller split between two Force 10 S25Ns that have two 12GB stacking cables. Each PowerEdge R710 has two of the four NICs dedicated for iSCSI storage. One goes to Switch A and other to Switch B. A third NIC goes to a Force 10 S50N for LAN access. From the PS6000 side, I have one NIC from each controller going to the S50N for dedicated managment. The other three NICs are split between Switchs A and B and are dedicated to storage traffic. As I have an odd number of NICs this isn't ideal but I'm not seeing performance issues at this point.

It is important to note there are two Dell documents that descripe how to configure MPIO. One is version 1.0 and the other is version 1.1 Version 1.1 mentions the VMware Native Multipathing Plugin (NMP) on the front cover. Both share the same Document ID of TR1049. 1.0 was released June 2009 and 1.1 was released October 2009 with "Update with new 1:1 Example and Clarifications"

My switches and vswitches are configured for jumbo frames per the 1.1 document. I'd really like to figure out the MPIO issue and be able to leave it set to "Round Robin" and not fixed. I set it to fixed as I had to bring two of these servers into production. I have the third available for config testing. As I only have 1Gbps links from my ESX hosts to the SAN does it even make sense to configure 3:1 oversubscription? Other's thoughts on this are welcome. Also I'm having problems locating the PR# mentioned in the above posts.

Message was edited by: dfollis

Just realized that version 4.3.4 for Equallogic is available. I was running 4.3.2. I will update this during Thursday evening and post an udpate if there are any improvements.

0 Kudos
mikeddib
Enthusiast
Enthusiast

Good call on the doc versioning, I believe that has been mentioned in a couple other posts also. Those posts also include the PR# references.

http://communities.vmware.com/thread/228054

http://communities.vmware.com/thread/215039

I believe the EQL doc had a table that referenced the number of NICs and the recommended number of vmkernel ports (1:1 or 3:1) based on your config. If you're using 1GbE nics I would say 1 vmkernel port per physical nic, which is also the recommendation in the VMware iSCSI Config Guide. Support told us to only use 1 vmkernel port per host until the release of the patch.

We also were running EQL firmware 4.3.2 and upgraded to 4.3.4 but didn't see a change when we upgraded in our case.

0 Kudos
s1xth
VMware Employee
VMware Employee

I can't stress this enough...this is NOT an EQL problem, this is VMWARE problem. The patch has been made, it passed QA testing and Beta tests, it is ready for deployment. It will (hopefully) be in the next patch release as being stated directly from VMware.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos