VMware Cloud Community
FStanislawczyk
Contributor
Contributor

Failover Paths

I am trying to balance my SAN by setting the preferred path but whenever the host server is rebooted the preferred path changes back to an arbitrary path. Is there a config file that can be modified or a way in ESX 3.X and 2.5X in advanced settings where this can be set?

I am using two Emulex HBA's on HP DL585's connected to a Hitachi 9570 and AMS500 on McData switches. I have checked with the hardware vendor and there is no way to lock it down on their side.

0 Kudos
12 Replies
jonathanp
Expert
Expert

Did you set it to fixed or MRU?

and on the san side did you set it to the same path

like ctrl A or ctrl B..?

Jon

0 Kudos
FStanislawczyk
Contributor
Contributor

Hi Jon

I've tried both "Fixed" and "Most Recently Used" both with the same result.

And yes the path is set the same on the controller.

0 Kudos
pcomo
Enthusiast
Enthusiast

HI,

With HDS 9570 and AMS the policy must be on MRU mode (active/passive array). You need to Enable the "SUN cluster Connection Mode" on the array.

You don't need to set a preferred path with MRU.

I have a similar problem, i have 4 path to shared lun to eah ESX.(VMware recommandation) and MRU policy but when i simulate a hba failure on one esx, this esx change to an arbitrary standby path which can be on the other array controller. So when this problem occured we encoutered the path trashing issue.

I would know how we can set a prefered list of failover link?

Is it possible?

If you have more informations let me know.

0 Kudos
jdvcp
Enthusiast
Enthusiast

I just addressed this in a script. I added a reference to my script in the /etc/rc.local

In my case, I have vmhba1 and vmhba3 on a single ESX server pointing to all the same luns. However, upon reboot, everything comes up by default on vmhba1.

Lets say you have VMFS1 on LUN vmhba1:0:1 and VMFS2 on LUN vmhba1:0:2. By default after boot, if you run esxcfg-mpath -l you will see VMFS1 available on PATH vmhba1:0:1:1 (trailing 1 is the partition) and VMFS2 available on PATH vmhba1:0:2:1.

However, you will also see PATHS to these LUNS as vmhba3:0:1:1 and vmhba3:0:2:1 respectively. I created my startup script to disable PATHS to VMFS1 via vmhba1, force a rescan, then reenable PATHS to vmhba1 and rescan again. This will force VMFS1 to failover from PATH vmhba1:0:1:1 to vmhba3:0:1:1. I leave VMFS2 alone so I get load balancing - VMFS1 now runs IO over vmhba1, then VMFS2 now runs IO over vmhba3.

Example script (CHECK SYNTAX SINCE THIS IS FROM MEMORY)

#!/bin/bash

#

\# First, disable the path to VMFS1 on vmhba1

esxcfg-mpath --lun=vmhba1:0:1 --path=vmhba1:0:1:1 --state=off

#

\# rescan vmhba1 and vmhba3, forcing failover if it did not occur already

vmkfstools -s vmhba1

vmkfstools -s vmhba3

#

\# now that path to VMFS1 is on vmhba3, reenable path via vmhba1

esxcfg-mpath --lun=vmhba1:0:1 --path=vmhba1:0:1:1 --state=on

#

\# rescan vmhba1 and vmhba3, just to make sure ESX knows vmhba1 is

\# back up. It WILL NOT USE vmhba1 again until reboot due to MRU

\# policy. That is why we run the script at each boot

vmkfstools -s vmhba1

vmkfstools -s vmhba3

Copy this script to a file called (anything you want) multipath.sh

chmod 755 multipath.sh

Add /your folder/multipath.sh after the last line in your rc.local (back up rc.local to rc.local.bk first just in case).

I have 20 ESX servers with 70 VMFS LUNs and it works like a charm.

0 Kudos
pcomo
Enthusiast
Enthusiast

Hi,

ok for this script, but if you have failure on one esx (hba failure, link between HBA and switch failure) this esx could change his to an arbitrary standby path which can be on the other SP than the other esx server.

This is the problem. My question is how esx server choose the standby path which will be active after this failure?

thanks.

0 Kudos
jdvcp
Enthusiast
Enthusiast

typically, if you have two hbas, two switches and two sps, you will have a full mesh.

vmhba\[portnumber]:\[target]:\[LUN]:\[Partition]

port number is hba number

target is sp (a=0; b=1)

LUN= lun number

Partition will always be 1

For LUN 24 on a full mesh fabric you will have:

vmhba1:0:24:1 on active preferred (preferred does not matter for MRU)

vmhba1:1:24:1 standby

vmhba2:0:24:1 on (not actually active though)

vmhba2:1:24:1 standby

So SPA is target 0 and SPB is target 1. The way typical full mesh fabric failover works is this. If there is a link failure, ESX will try to use the next path available to the same target. This prevents LUN thrashing. So if the link to SPA is down on vmhba1, it will try vmhba2:0:24:1 next. If that one does not work (meaning SPA is down), it will bry a standby path. Once that happens, the SAN is basically instructed to trespass the LUN to SPB. Not a massive issue, but this will cause all other ESX servers looking for that LUN to change their paths as well to the new target.

So the failover is not quite arbitrary from a target perspective.

Hope I answered your question.

0 Kudos
pcomo
Enthusiast
Enthusiast

Hi,

thanks a lot for your help.

We have this same configuration and seems that all is ok now.

Our problem is that our zone configuration and link between HBA and SP were no correct.

It's that why we annot see the correct failover works.

So one question yet.

are SP ports have some importance on this configuration?

we have change to this configuration:

old: HBA1 -> spAport0

HBA1 -> spBport0

HBA2 -> spAport1

HBA2 -> spBport1

to new:

HBA1 -> spAport0

HBA1 -> spBport1

HBA2 -> spAport1

HBA2 -> spBport0

is there a difference between these two configuration?

thanks.

Message was edited by:

pcomo

0 Kudos
FStanislawczyk
Contributor
Contributor

Looks like you have a handle on this. I will pass this info up to the powers that be and see if we can test and implement it if it works as well for us as it does for you.

0 Kudos
jdvcp
Enthusiast
Enthusiast

There is no difference. HBA failure will not cause LUN trespass in either case.

0 Kudos
pcomo
Enthusiast
Enthusiast

Hi,

we have seen other strange.

We have 3 esx, on two esx server we have the same target for the same SP.

But on one esx we have on eah HBA a different target for same SP.

And if i understand, esx multipathing check the target number for failover.

Is it true?

And how we could force a HBA to discover a SP to a same target?

Thanks.

0 Kudos
pcomo
Enthusiast
Enthusiast

Hi,

ok now each SP have the same target to eah HBA.

Now we have tested some link failure.

But when we simulate this failure:

1: esx1 HBA1 failure -> active path change to the second port of the same SP on this esx. All other esx continue to use HBA1 SPA0.

2: SPA1 failure witout reactivate esx1 HBA1 -> on this esx1 the active path change to the second SP (SPB1) but other esx didn't change to this SPB.

Normally if one esx need to access the LUN to the non owner SP it should send to the other esx a scsi command to change the active path too.

Should i indicate something in esx->Advanced settings->Disk->SANdeviceWithAPfailover?

Thanks.

0 Kudos
dclark
Enthusiast
Enthusiast

Hello

I am seeing the same problem, 1 esx host fails over to different controller, but the other two seem to access from the original CU. Did you ever find a solution?

Thanks

0 Kudos