VMware Cloud Community
EagleB5
VMware Employee
VMware Employee
Jump to solution

vSphere 6 U2, PDL errors on VVOL PE device

Hi guys

Since we've updated our hosts to vSphere 6.0 U2 with the latest HPE image we noticed massive lags on the hosts. Looking into vmkernel.log shows us thousands of PDL errors on a VVOL PE from our 3PAR array. But we don't use VVOL at this time... Is there e known Issue ore a way to disable VVOL functions? Drivers and Firmware are actual.

[root@vesxvdi01:~] esxcli storage core adapter list

HBA Name  Driver         Link State  UID                                   Capabilities         Description

--------  -------------  ----------  ------------------------------------  -------------------  --------------------------------------------------------------------------------

vmhba0    ata_piix       link-n/a    sata.vmhba0                                                (0000:00:1f.2) Intel Corporation ICH10 4 port SATA IDE Controller

vmhba1    hpsa           link-n/a    sas.50123456789abcde                                       (0000:04:00.0) Hewlett-Packard Company Smart Array P410i

vmhba2    lpfc           link-up     fc.20000090fa56bb24:10000090fa56bb24  Second Level Lun ID  (0000:07:00.0) Emulex Corporation Emulex LPe12000 8Gb PCIe Fibre Channel Adapter

vmhba3    lpfc           link-up     fc.20000090fa56bb25:10000090fa56bb25  Second Level Lun ID  (0000:07:00.1) Emulex Corporation Emulex LPe12000 8Gb PCIe Fibre Channel Adapter

vmhba32   bnx2i          unbound     iscsi.vmhba32                                              Broadcom NetXtreme II iSCSI Adapter

vmhba33   bnx2i          unbound     iscsi.vmhba33                                              Broadcom NetXtreme II iSCSI Adapter

vmhba34   bnx2i          unbound     iscsi.vmhba34                                              Broadcom NetXtreme II iSCSI Adapter

vmhba35   bnx2i          unbound     iscsi.vmhba35                                              Broadcom NetXtreme II iSCSI Adapter

vmhba36   ata_piix       link-n/a    sata.vmhba36                                               (0000:00:1f.2) Intel Corporation ICH10 4 port SATA IDE Controller

[root@vesxvdi01:~]

[root@vesxvdi01:~] vmkload_mod -s lpfc | grep Version

Version: 10.4.236.0-1OEM.600.0.0.2159203

[root@vesxvdi01:~]

vmkernel.log

2016-04-05T14:27:05.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T2:L256 device naa.2ff70002ac014e9d - triggering path failover

2016-04-05T14:27:05.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...

2016-04-05T14:27:06.577Z cpu9:57333)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400

2016-04-05T14:27:06.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T3:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

2016-04-05T14:27:06.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T3:L256 device naa.2ff70002ac014e9d - triggering path failover

2016-04-05T14:27:06.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...

2016-04-05T14:27:07.577Z cpu12:33266)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400

2016-04-05T14:27:07.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T2:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

2016-04-05T14:27:07.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T2:L256 device naa.2ff70002ac014e9d - triggering path failover

2016-04-05T14:27:07.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...

2016-04-05T14:27:07.802Z cpu0:33356)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x2a (0x43a5c0694e40, 53705) to dev "naa.600508b4000af00d0000500001bc0000" on path "vmhba37:C0:T0:L1" Failed: H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:NONE

2016-04-05T14:27:07.802Z cpu0:33356)ScsiDeviceIO: 2613: Cmd(0x43a5c0694e40) 0x2a, CmdSN 0x80000070 from world 53705 to dev "naa.600508b4000af00d0000500001bc0000" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.

2016-04-05T14:27:08.577Z cpu9:57333)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400

2016-04-05T14:27:08.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T3:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

2016-04-05T14:27:08.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T3:L256 device naa.2ff70002ac014e9d - triggering path failover

2016-04-05T14:27:08.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...

2016-04-05T14:27:09.577Z cpu9:33266)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400

2016-04-05T14:27:09.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T2:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

2016-04-05T14:27:09.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T2:L256 device naa.2ff70002ac014e9d - triggering path failover

2016-04-05T14:27:09.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...

2016-04-05T14:27:10.577Z cpu9:57333)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400

2016-04-05T14:27:10.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T3:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

1 Solution

Accepted Solutions
EagleB5
VMware Employee
VMware Employee
Jump to solution

Hi guys

Finally, there is a patch for the 3PAR arrays available which fix this issue by answering the SCSI requests. There are two patches, one for 3.2.1 P37 and one for 3.2.2 P24.

They work correctly.

View solution in original post

Reply
0 Kudos
13 Replies
vijayrana968
Virtuoso
Virtuoso
Jump to solution

Yes, this is known issue. Logs showing sense code : failed on path "vmhba2:C0:T2:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0. which means LUN is no longer available or is unmapped.

Fail over is triggering but not find next available path.

This table outlines possible SCSI sense codes that determine if a device is in a PDL state:

SCSI sense codeDescription
H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0LOGICAL UNIT NOT SUPPORTED
H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x4c 0x0LOGICAL UNIT FAILED SELF-CONFIGURATION
H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x3e 0x3LOGICAL UNIT FAILED SELF-TEST
H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x3e 0x1LOGICAL UNIT FAILURE

VMware KB: Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x

VMware KB: SCSI events that can trigger ESX server to fail a LUN over to another path

You need to request hot-fix from Vmware, these are not public and available on request. Before that that validate multipathing is correct.

Reply
0 Kudos
kenthhjerpe
Contributor
Contributor
Jump to solution

Hi!

Did you get this sorted out? We upgraded our test/dev enviroment and we ran into the exact same issue.

Regards

Kenth

Reply
0 Kudos
DPfromHE
Contributor
Contributor
Jump to solution

Experiencing a similar thing on a newly installed host.   Any resolve?

2016-04-27T18:49:03.902Z cpu48:33639)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x85) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T6:L14 device naa.514f0c535640000f - triggering path failover

2016-04-27T18:49:03.902Z cpu48:33639)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.514f0c535640000f": awaiting fast path state update before retrying failed command again...

2016-04-27T18:49:04.860Z cpu32:33475)<6>host14: fip: host14: FIP VLAN ID unavail. Retry VLAN discovery.

2016-04-27T18:49:04.860Z cpu32:33475)<6>host14: fip: fcoe_ctlr_vlan_request() is done

2016-04-27T18:49:04.890Z cpu10:33481)<6>host15: fip: host15: FIP VLAN ID unavail. Retry VLAN discovery.

2016-04-27T18:49:04.890Z cpu10:33481)<6>host15: fip: fcoe_ctlr_vlan_request() is done

2016-04-27T18:49:04.902Z cpu43:55091)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.514f0c535640000f" - issuing command 0x439e538367c0

2016-04-27T18:49:04.902Z cpu48:33639)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x85 (0x439e538367c0) to dev "naa.514f0c535640000f" failed on path "vmhba2:C0:T6:L14" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

Reply
0 Kudos
DPfromHE
Contributor
Contributor
Jump to solution

I found this kb: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=21332...

If I stop smartd, the log entries do stop, but still have this in the vmkernel.log

2016-04-27T18:57:43.407Z cpu18:33464)<6>host15: fip: host15: FIP VLAN ID unavail. Retry VLAN discovery.

2016-04-27T18:57:43.407Z cpu18:33464)<6>host15: fip: fcoe_ctlr_vlan_request() is done

2016-04-27T18:57:45.371Z cpu38:33489)<6>host14: fip: host14: FIP VLAN ID unavail. Retry VLAN discovery.

2016-04-27T18:57:45.371Z cpu38:33489)<6>host14: fip: fcoe_ctlr_vlan_request() is done

Reply
0 Kudos
kenthhjerpe
Contributor
Contributor
Jump to solution

Are you running a 3PAR array as well? ..

I tried to stop the smart daemon as well but that didn't help us, took forever to do a storage refresh on the ESX even with the smartd stopped.

We got this issue sorted out with this KB :

VMware KB: Changing the Disk.MaxLUN parameter on ESXi Hosts

I turned it down to 180. (depends on how many LUNs you got in your environment), this will make the ESX only see the LUNids below 180 in my case, that means that the vVol PE (LUNid 256) is no longer present in the ESX. If you try it, don't forget to reboot the host after you change the value.

Regards Kenth

Reply
0 Kudos
DPfromHE
Contributor
Contributor
Jump to solution

No 3PAR, but all FC storage with EMC XtremIO, EMC VNX, and NetApp. 

Reply
0 Kudos
kenthhjerpe
Contributor
Contributor
Jump to solution

Did you try to change the Disk.MaxLun value to a value below 256? .. We tried 180 and that did the trick!

KB

VMware KB: Changing the Disk.MaxLUN parameter on ESXi Hosts

Reply
0 Kudos
kenthhjerpe
Contributor
Contributor
Jump to solution

Ah okay. Since the vVol PE on a 3PAR is ID 256, i could use the MaxDisk value to make it disappear.

The lun you seem to have a problem with is ID 14 so the workaround I used is probably not the way to go for you.

Reply
0 Kudos
EagleB5
VMware Employee
VMware Employee
Jump to solution

Hi guys

Meanwhile I have two cases open with HPE and VMware regarding this issue. The only workaround is to limit the Disk.MaxLUN=256 to sort out the PE device. But unfortunately you can not use VVOLs at this point. Disabling the smartd service doesn't have an impact on this. I let you know if there's a solution for that...

Reply
0 Kudos
AllanKjaer
Enthusiast
Enthusiast
Jump to solution

Hi

I have also see this under upgrade from 5.5 to 6.0 update 2, I had to change the Disk.MaxLun to 255 before upgrading, if it's not set the upgrades stops at the start, after loading all the modules.

This is also HPE 3PAR

Reply
0 Kudos
EagleB5
VMware Employee
VMware Employee
Jump to solution

Hi guys

Finally, there is a patch for the 3PAR arrays available which fix this issue by answering the SCSI requests. There are two patches, one for 3.2.1 P37 and one for 3.2.2 P24.

They work correctly.

Reply
0 Kudos
rtait
Contributor
Contributor
Jump to solution

@EagleB5 - any idea what MU level for P37? Not able to find anything on it...

Reply
0 Kudos
EagleB5
VMware Employee
VMware Employee
Jump to solution

Sorry for my delay, I must configure a notification somehow.

You need 3.2.1 MU3 for P37 or 3.2.2 MU2 for P24.

Reply
0 Kudos