VMware Cloud Community
Ravindra01
Enthusiast
Enthusiast

Will Round Robin NMP uses all the Paths , Active Active array ??

Hi All,

  I have query regarding the RR vmware NMP , as per the vmware if the device is set to use vmware RR policy, by default it will use all the active paths after every 1000 IOPS.

  In our case, we have active-active storage array and 4 paths for each LUN. As I aware the IO's will be transferred only on Proffered Optimized paths ( only 2 Paths), when the paths goes down , based on the SCSI sense code , path will be transferred.

My query is, in case of active active array, and NMP set to Round Robin. will it use all 4 paths or only 2 paths ( Proffered Optimized ) ??

Thank you in advance.

6 Replies
vijayrana968
Virtuoso
Virtuoso

For Active/Active storage arrays, all paths will be used in the Round Robin policy.

Please refers to below notes from Setting a Path Selection Policy

Round Robin (VMware)

The host uses an automatic path selection algorithm rotating through all active paths when connecting to active-passive arrays, or through all available paths when connecting to active-active arrays. RR is the default for a number of arrays and can be used with both active-active and active-passive arrays to implement load balancing across paths for different LUNs.

Finikiez
Champion
Champion

Hello!

Optimized and Non-optimized paths exist only with ALUA arrays. NMP is ALUA aware and with RR happens only within Optimized paths if they are UP. If all optimized paths fail than it switches to Non-Optimized paths.

If you have true Active-Active array than you shouldn't care about at all. All available paths will be used.

Ravindra01
Enthusiast
Enthusiast

Thank you for you reply.

Recently we had an issue, with one of the LUN and the VM's provisioned from that data store had impacted. When we started investigating, we found in the logs path went down because of the one of the controller issue.

The question is , why it didn't fail over the paths immediately ? If it did, why we saw the disk issues at vm layer ? how to check how much time took to fail over the paths ??

Even we logged case with VMware, as per their analysis, the respective LUN has not received 0X1 SCSI sense code to failover the path. In our case , storage controller had power issue and went down abruptly. then who will send the SCSI sense code to the ESXi host to fail over the path ??

VMware stick to the point until we receive 0X1 command will not fail over , SAN vendor says ESXi host should take care of the fail over, from SAM end othr two paths are active.

Now really not able to conclude anything on this Smiley Sad

0 Kudos
Finikiez
Champion
Champion

Can you attach vm-support here or vmkernel and vobd logs for the time when you had your issue.

LUN has not received 0X1 SCSI sense code to failover the path

then who will send the SCSI sense code to the ESXi host to fail over the path

H:0x1 is a sence code is loged when there is no _physical_ connect to storage on affected path. This sence code doesn't come from storage array. That's a ESXi host which detects this status.

I can help to check the logs and see if you had path failover and how long it took to fail over.

Also you can see no H:0x1 messeges in vmkernel.log when there is no IO to storage, but ESXi hosts still detects path failure and logs this into vobd.log

Ravindra01
Enthusiast
Enthusiast

Thank you for your..comments.

May be, I could not able to share the vm-support logs as per the organization policy.. Could you please , guide me what I can look for in the logs ?? is it path had fail over or the same path become active ??

Below are log snippets from VOBD logs.

2017-10-12T11:48:45.551Z: [scsiCorrelator] 2761378460761us: [esx.problem.storage.redundancy.degraded] Path redundancy to storage device <LUN Number> degraded. Path vmhba1:C0:T4:L65 is down. Affected datastores: " Data store name".

2017-10-12T11:48:45.553Z: [scsiCorrelator] 2761378462355us: [esx.problem.storage.redundancy.degraded] Path redundancy to storage device <LUN Number>  degraded. Path vmhba2:C0:T9:L65 is down. Affected datastores: "Data store name".

2017-10-12T11:50:01.578Z: [scsiCorrelator] 2761454487021us: [esx.clear.storage.redundancy.restored] Path redundancy to storage device <LUN Number> (Datastores: "< Data store name") restored. Path vmhba1:C0:T4:L65 is active again.

0 Kudos
Finikiez
Champion
Champion

I see two paths failures vmhba1:C0:T4:L65 and vmhba2:C0:T9:L65 It's not necessary to see H:0x1 status in the vmkernel.log at the same time.

If host didn't use them at the time of failure you won't see any failover events.

You can also want to check vmkernel.log for the same time and see paths status

esxcli storage nmp path list -d <naa.id of your device>

If you can, post vmkernel log here. You can also grep vmkernel.log with naa.id of affected device.