ESXi does not follow iSCSI ALUA path selection for...

bensanmina · ‎09-05-2019

Hi,

When using an iSCSI Active/Passive (standby, ALUA access state 0x2) dual node ALUA compatible storage array, ESXi always treats the secondary, standby path as an Active/Optimized path, where I would have expected the path to show as inactive/dead/non optimized. The iSCSI devices get discovered as VMW_SATP_ALUA from ESXi (which is correct), and ESXi defaults to VMW_PSP_MRU for its path selection policy, as the array doesn't match any custom rules from "esxcli storage nmp satp rule list", which is also correct. However, ESXi will attempt to connect to whichever path comes up first and mark it as "current/preferred", even if this is the standby path, leading to all IOs to that device failing and the storage adapter rescan operation to take several minutes to complete.

When I check from a RHEL 7.5 machine I can see that the paths are set correctly:

Active path:

sg_rtpg /dev/sdb

Report target port groups:

target port group id : 0x0 , Pref=1

target port group asymmetric access state : 0x00

T_SUP : 1, O_SUP : 1, LBD_SUP : 0, U_SUP : 1, S_SUP : 1, AN_SUP : 1, AO_SUP : 1

status code : 0x00

vendor unique status : 0x00

target port count : 02

Relative target port ids:

0x01

0x02

Passive path:

sg_rtpg /dev/sdc

Report target port groups:

target port group id : 0x0 , Pref=0

target port group asymmetric access state : 0x02

T_SUP : 1, O_SUP : 1, LBD_SUP : 0, U_SUP : 1, S_SUP : 1, AN_SUP : 1, AO_SUP : 1

status code : 0x02

vendor unique status : 0x00

target port count : 01

Relative target port ids:

0x01

Access state 0x00 -> Active, optimized

Access state 0x02 -> standby

Pref=1 -> Preferred bit True

Pref=0 -> Preferred bit False

However, when discovering my LUNs from ESXi, I end up with a mix bag of multipath devices configured either wrong or right (there's about 50% chance, depending on which of the primary or secondary path comes up first):

Invalid config (all the "passive" LUNs are exported as LUN2, active LUNs are exported as LUN1):

naa.600140501d2d1e4b812019e700000000

Device Display Name: MPSTOR iSCSI Disk (naa.600140501d2d1e4b812019e700000000)

Storage Array Type: VMW_SATP_ALUA

Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=0,TPG_state=AO}}

Path Selection Policy: VMW_PSP_MRU

Path Selection Policy Device Config: Current Path=vmhba64:C0:T1:L2

Path Selection Policy Device Custom Config:

Working Paths: vmhba64:C0:T1:L2

Is USB: false

=> LUN2 (passive) is detected as Active/Optimized and used as a working path, even though all IOs to it fail.

iqn.1998-01.com.vmware:host63-32892c1c-00023d000003,iqn.2004-04.com.mpstor:ctrla:esxia1,t,1-naa.600140501d2d1e4b812019e700000000

Runtime Name: vmhba64:C0:T0:L1

Device: naa.600140501d2d1e4b812019e700000000

Device Display Name: MPSTOR iSCSI Disk (naa.600140501d2d1e4b812019e700000000)

Group State: active

Array Priority: 1

Storage Array Type Path Config: {TPG_id=0,TPG_state=AO,RTP_id=1,RTP_health=UP}

Path Selection Policy Path Config: {non-current path; rank: 0}

iqn.1998-01.com.vmware:host63-32892c1c-00023d000004,iqn.2004-04.com.mpstor:ctrlb:esxia1,t,1-naa.600140501d2d1e4b812019e700000000

Runtime Name: vmhba64:C0:T1:L2

Device: naa.600140501d2d1e4b812019e700000000

Device Display Name: MPSTOR iSCSI Disk (naa.600140501d2d1e4b812019e700000000)

Group State: active

Array Priority: 1

Storage Array Type Path Config: {TPG_id=0,TPG_state=AO,RTP_id=1,RTP_health=UP}

Path Selection Policy Path Config: {current path; rank: 0}

=> Both are detected as Active, Optimized with the same rank, even though one is actually standby and the other has the "preferred" bit set.

And another one that worked fine (just luckier);

naa.600140501d2d1e4b812019ca00000000

Device Display Name: MPSTOR iSCSI Disk (naa.600140501d2d1e4b812019ca00000000)

Storage Array Type: VMW_SATP_ALUA

Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=0,TPG_state=AO}}

Path Selection Policy: VMW_PSP_MRU

Path Selection Policy Device Config: Current Path=vmhba64:C0:T14:L1

Path Selection Policy Device Custom Config:

Working Paths: vmhba64:C0:T14:L1

Is USB: false

iqn.1998-01.com.vmware:host63-32892c1c-00023d000003,iqn.2004-04.com.mpstor:ctrla:esxia0,t,1-naa.600140501d2d1e4b812019ca00000000

Runtime Name: vmhba64:C0:T14:L1

Device: naa.600140501d2d1e4b812019ca00000000

Device Display Name: MPSTOR iSCSI Disk (naa.600140501d2d1e4b812019ca00000000)

Group State: active

Array Priority: 1

Storage Array Type Path Config: {TPG_id=0,TPG_state=AO,RTP_id=1,RTP_health=UP}

Path Selection Policy Path Config: {current path; rank: 0}

iqn.1998-01.com.vmware:host63-32892c1c-00023d000004,iqn.2004-04.com.mpstor:ctrlb:esxia0,t,1-naa.600140501d2d1e4b812019ca00000000

Runtime Name: vmhba64:C0:T15:L2

Device: naa.600140501d2d1e4b812019ca00000000

Device Display Name: MPSTOR iSCSI Disk (naa.600140501d2d1e4b812019ca00000000)

Group State: active

Array Priority: 1

Storage Array Type Path Config: {TPG_id=0,TPG_state=AO,RTP_id=1,RTP_health=UP}

Path Selection Policy Path Config: {non-current path; rank: 0}

I have to resort to making those LUNs using a VMW_PSP_FIXED policy and manually setting up which path is primary and which is secondary. This is time consuming, but doable, though it leads to other problems where ESXi will "invert" the paths or attempt to connect to the wrong path on reboot, making rebooting the ESXi node a challenge.

What can I do to make ESXi follow my Active/Standby ALUA LUNs correctly, that is always selective the path that is advertising itself as "Active/Optimised" and never attempting READ or WRITE IOs on the path advertising itself as "Standby"?

Thanks in advance for your help!

vXav · ‎09-05-2019

If all paths show up as active-optimized in vSphere I would suggest looking at the SPs of the array but your Red-Hat servers shouw the right information so probably not relevant.

Silly question (the annoying one), have you checked the firmware/drivers of your HBAs?

What's your array?

Blog - Linkedin

bensanmina · ‎09-05-2019

Thanks for your reply,

I am using Mellanox Connect-X 3 Pro HCA (40G Ethernet adapter) with a fairly recent firmware image on them (having issue trying to print this info with ESXi). I did not install any 3rd party drivers, ESXi automatically loaded the "nmlx4_en" driver for them, so this would be the "out of the box" ESXi 6.7 Mellanox Ethernet driver. But since I'm not using hardware iSCSI the driver/firmware should be oblivious to SCSI related traffic.

bensanmina · ‎09-05-2019

Forgot to answer your question: the array is made by ourselves, Sanmina. It's a Viking Enterprise Solutions fx60-hd-hp all flash array. As such, I have the ability to reconfigure the array in any way I want (provided the code allows it). So far I've tried a few things: I've set the "passive" path to always use LUN2 (versus LUN1 for the active), tried changing a few other parameters but with no success.

I see that Nimble arrays (which I think also use "Standby" paths) seem to require Users to use their custom SATP/PSP driver in ESXi. Could it be that ESXi is not properly aware of "Standby" paths (ALUA access state 0x02) out of the box and is only able to deal with "Active/Optimized" and "Active/Non-Optimized" paths?

vXav · ‎09-05-2019

ESXi is aware of standby paths. See this.

I don't know for Nimble but could be because their SATP rule wasn't included in the base iso. Pure storage got theirs added after a while.

Also when you say active and Passive LUNs.

LUNs are owned by one Storage controller on the array. From an ESXi point of view you speak of active or "passive" paths, not LUN.

I worked with a couple of Dell ALUA arrays and never had the issue you are facing.

Does the documentation of the product you use include configuration for ESXi? Can you get support from them?

Blog - Linkedin

bensanmina · ‎09-05-2019

Thanks for that link. Good to know that Standby paths are mentioned in the Documentation. Yes, sorry I meant "path", I'm still adjusting to the ESXi terms. Essentially I have two path for the same Volume (same serial number). One is active (active/optimized) and can handle IOs, the other is passive (standby) but cannot.

I am currently working on making this product compatible with ESXi, so the problem hasn't been solved and I would be the point of contact for support questions unfortunately. Were the Dell ALUA arrays you used Active/Standby? If so, would you remember their model names by any chance? Thanks in advance!

vXav · ‎09-05-2019

They used AO / ANO and not AO / standby. Why do you want to use standby?

Can you try to configure it to ANO instead and see if it makes a difference?

Can you find relevant information in the vmkernel log of the host? Sense codes may be useful.

Also I never drilled that far in ALUA (never needed to) but I believe the 2 paths of a LUN should have a different TPG_ID, yours have the same.

If all the paths to all the LUNs report a TPG_ID of 0, then you have indeed 50% chance to end up using the correct path.

Red Hat shows TPG_IDs of 0 as well with a "Pref" suffix which I don't see in your ESXi command output.

Blog - Linkedin

bensanmina · ‎09-06-2019

The problem we have with ANO is that in that mode the target should still be able to accept IOs. We don't want any IOs to land on the secondary path. I'll try and modify the target to not accept IOs while in ANO, see if that works. About the TPG ID, I did think it was going to be an issue, so when I changed the LUN ID to be 1 for Active and 2 for Passive, I also changed the TPG ID to 1 for Active and 2 for Passive at the target level and thought that was done. But you're right: something isn't correct as the initiators see the same TPG (0) for both devices, I'll look into that also.

Thanks a lot for these suggestions, these will help quite a lot!

vXav · ‎09-06-2019

When using AO / ANO, ESXi will not send IOs down the ANO path unless all AO paths are dead.

For example if you have 4 AO paths and 4 ANO paths and the LUN configured with Round Robin, it will switch every 1000 IOs (by default) on AO paths only.

In vCenter the AO paths show up as "Active (I/O)" and the ANO paths as "Active". it works fine that way.

ps: If you have 1 iSCSI nic you're fine, if you have 2 you should see 4 paths per Lun.

Blog - Linkedin

bensanmina · ‎09-06-2019

The problem with have with our array is that we cannot accept any IOs down the ANO path, even if the AO path is dead. This is why we have to rely on something like Standby, until the failover (on the controller side) is fully completed and we can accept IOs again. For this particular configuration, the hosts are connected point to point to the two controllers, and we only use one path per controller, so we see exactly two paths per LUN. I'll try and change the path to ANO but respond with an error to all IOs (until the controller failover has completed), see if that works. I'll also look into the TPG ID, I thought also this could be a problem.

vXav · ‎09-06-2019

Ok gotcha. I assume the connection directly to the controllers is only for the lab.

Let us know how it goes, this is an interesting issue!

Blog - Linkedin

benmpstor · ‎09-06-2019

It will take me a few days to get this done, but I'll let you know what I find. Thanks again for the help!

All