VMware Cloud Community
JDLangdon
Expert
Expert
Jump to solution

ESX SAN LUN Multipathing

In my environment we have each Esx host server configured with two HBA's so that each HBA is connected to a different fabric in order to provide redundancy. In the SAN (DS4800) side we have two controllers with each controller having two connections to a single fabric. In total the SAN (DS4800) as four connections.

The end result with the zoning, is that each Esx host server has four paths to every LUN. My question is, when using A/P mutlipathing, is having four paths per LUN overkill? The reason I'm asking is because when the SAN flips a LUN's preferred path to a different controller which is on a different HBA, most of the VM's crash.

We spoke to IBM concerning our zoning and they recommended that with physical non-ESX servers we only need two paths per LUN. I'm wondering if the same policy/principle holds true for Esx servers.

Jason

0 Kudos
1 Solution

Accepted Solutions
christianZ
Champion
Champion
Jump to solution

>The reason I'm asking is because when the SAN flips a LUN's preferred path to a different controller which is on a different HBA, most of the VM's crash.

Well when I see the page 63 from here: http://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_san_cfg.pdf

I would say your controller wouldn't change in this case.

And as posted before the 4 paths are OK for Esx.

View solution in original post

0 Kudos
12 Replies
RobBuxton
Enthusiast
Enthusiast
Jump to solution

That shouldn't happen. I'd check that all your SAN components are on the HCL.

We have an HP EVA SAN, with dual fabrics which results in 4 paths. We've rebooted fabric switches and EVA Controllers with no problems to the ESX Servers or their guests.

The four paths is down to dual redundancies in different places in the link. You don't want to remove any of dual paths anywhere. In the end, 4 paths is good. ESX should be able to cope, and does in our configuration.

0 Kudos
JDLangdon
Expert
Expert
Jump to solution

Have you made any modifications to your VM's in any way? I agree that Esx should be able to handle the four paths but in our case it doesn't. I've noticed that some of our VM's have VMtools that need to be updated but but some of these with out of date VMtools are surviving.

Jason

0 Kudos
christianZ
Champion
Champion
Jump to solution

>The reason I'm asking is because when the SAN flips a LUN's preferred path to a different controller which is on a different HBA, most of the VM's crash.

Well when I see the page 63 from here: http://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_san_cfg.pdf

I would say your controller wouldn't change in this case.

And as posted before the 4 paths are OK for Esx.

0 Kudos
RobBuxton
Enthusiast
Enthusiast
Jump to solution

Jason,

No we've made no changes to VMs. Some of our more unusual guests don't have VMTools at all so the version shouldn't be an issue.

Not sure where the info is for your IBM SAN, but the path policy and whether it's active/active or active/passive may need to be changed.

You might also want to check in the vmkernel and vmkwarning log files to see what errors you're getting.

0 Kudos
JDLangdon
Expert
Expert
Jump to solution

How do you have your zones configured? In my environment we have eight zones per fabric which works out to one zone per Esx host per fabric. Our fabrics are not interconnected. Each zone consists of one Esx host HBA, and one HBA from each SAN controller. In total, each zone will have three members.

Should I scrap this and create one zone which consists of ALL Esx hosts and the SAN?

Jason

0 Kudos
boydd
Champion
Champion
Jump to solution

You should have seperate zones for each initiator to target (One port to another to get an ITL). Having more than one initiator to target in a zone will/may cause failover interuptions. The DS requires fixed pathing since it is an A/A rig. Hopefully I read you posting correctly Smiley Happy

DB

VMware Communities Moderator

DB
0 Kudos
JDLangdon
Expert
Expert
Jump to solution

You should have seperate zones for each initiator to target (One port to another to get an ITL). Having more than one initiator to target in a zone will/may cause failover interuptions. The DS requires fixed pathing since it is an A/A rig. Hopefully I read you posting correctly Smiley Happy

DB

IBM's DS4800 is a FastT A/P rig. The 8000 series Shark is A/A. The Shark will be implemented in May. Smiley Wink

0 Kudos
langonej
Enthusiast
Enthusiast
Jump to solution

"In the SAN (DS4800) side we have two controllers with each controller having two connections to a single fabric. In total the SAN (DS4800) as four connections."

I don't know this particular array, but is there a reason why you aren't criss-crossing?

If your array has two controllers and two host ports per controller it makes more sense to me to do:

Controller 1, Host Port 1 = Fabric A

Controller 1, Host Port 2 = Fabric B

Controller 2, Host Port 1 = Fabric A

Controller 2, Host Port 2 = Fabric B

I guess it's possible this array could be "confused" by such a config if it's an entry level device.

As already mentioned in this thread, what you are doing should work just perfectly. 4 paths per LUN is not overkill, I generally see 8 paths per LUN (on arrays with 4 host ports per controller). 2 paths per LUN sounds comical - be sure to thank IBM for that.

When a LUN is trespassed from Controller 1 to Controller 2 your VM crashes (because it loses connectivity)? I'd power off the VMs and test each path to make sure it works. I'd also triple-check your zoning and your "LUN OS type" on the array side if you have such a setting. Some arrays will allow you to set what kind of host is connecting (e.g. Win2K3, VMWare, Unix, et cetera) so that the SCSi commands are tailored for that OS.

Best of luck.

JDLangdon
Expert
Expert
Jump to solution

"In the SAN (DS4800) side we have two controllers with each controller having two connections to a single fabric. In total the SAN (DS4800) as four connections."

I don't know this particular array, but is there a reason why you aren't criss-crossing?

We are criss-crossing our controllers exactly like what you've outlined below. What I meant was that the fabrics are not interconnected. Fabric A is not connected Fabric B.

If your array has two controllers and two host ports per controller it makes more sense to me to do:

Controller 1, Host Port 1 = Fabric A

Controller 1, Host Port 2 = Fabric B

Controller 2, Host Port 1 = Fabric A

Controller 2, Host Port 2 = Fabric B

That is exactly how we are currently setup.

When a LUN is trespassed from Controller 1 to Controller 2 your VM crashes (because it loses connectivity)? I'd power off the VMs and test each path to make sure it works. I'd also triple-check your zoning and your "LUN OS type" on the array side if you have such a setting.

It's strange isn't it? I have my OS types set to LNXCLVMWARE which, from what I've been reading, is the correct OS type. I'm also in the process of triple-checking our zoning configurations.

A typical example of a zone in my environment would be Esx Host 1 - HBA 1, Controller 1 - Host Port 1, and Controller 2 Host Port 1. Esx Host 2 will have it's own zone with the same controllers, as will Esx host 3, and Esx host 4.

Jason

0 Kudos
langonej
Enthusiast
Enthusiast
Jump to solution

Jason,

Then in my opinion it should work perfectly. If you can, turn off the VMs on a specific LUN and start a file copy to that LUN for the SC. If you switch paths does the copy completely abort? Freeze the COS? I also agree that the logs may have something specific. You could also check to see what the HBA queue is doing. I believe it should spike during the failover and then flush out.

- The other Jason.

mikepodoherty
Expert
Expert
Jump to solution

Are your paths set to fixed or Most Recently Used? I had some issues with a DS4300 - part of the fix was the configuration of Solaris 10 and the pther part was to ensure that the paths were MRU.

The sole exception is the gatekeeper - fixed but each HBA has a separate connection to the gatekeeper.

0 Kudos
JDLangdon
Expert
Expert
Jump to solution

Are your paths set to fixed or Most Recently Used? I had some issues with a DS4300 - part of the fix was the configuration of Solaris 10 and the pther part was to ensure that the paths were MRU.

The sole exception is the gatekeeper - fixed but each HBA has a separate connection to the gatekeeper.

I'm going to double-check but I'm pretty sure that everything is set to MRU.

0 Kudos