Hi guys
Since we've updated our hosts to vSphere 6.0 U2 with the latest HPE image we noticed massive lags on the hosts. Looking into vmkernel.log shows us thousands of PDL errors on a VVOL PE from our 3PAR array. But we don't use VVOL at this time... Is there e known Issue ore a way to disable VVOL functions? Drivers and Firmware are actual.
[root@vesxvdi01:~] esxcli storage core adapter list
HBA Name Driver Link State UID Capabilities Description
-------- ------------- ---------- ------------------------------------ ------------------- --------------------------------------------------------------------------------
vmhba0 ata_piix link-n/a sata.vmhba0 (0000:00:1f.2) Intel Corporation ICH10 4 port SATA IDE Controller
vmhba1 hpsa link-n/a sas.50123456789abcde (0000:04:00.0) Hewlett-Packard Company Smart Array P410i
vmhba2 lpfc link-up fc.20000090fa56bb24:10000090fa56bb24 Second Level Lun ID (0000:07:00.0) Emulex Corporation Emulex LPe12000 8Gb PCIe Fibre Channel Adapter
vmhba3 lpfc link-up fc.20000090fa56bb25:10000090fa56bb25 Second Level Lun ID (0000:07:00.1) Emulex Corporation Emulex LPe12000 8Gb PCIe Fibre Channel Adapter
vmhba32 bnx2i unbound iscsi.vmhba32 Broadcom NetXtreme II iSCSI Adapter
vmhba33 bnx2i unbound iscsi.vmhba33 Broadcom NetXtreme II iSCSI Adapter
vmhba34 bnx2i unbound iscsi.vmhba34 Broadcom NetXtreme II iSCSI Adapter
vmhba35 bnx2i unbound iscsi.vmhba35 Broadcom NetXtreme II iSCSI Adapter
vmhba36 ata_piix link-n/a sata.vmhba36 (0000:00:1f.2) Intel Corporation ICH10 4 port SATA IDE Controller
[root@vesxvdi01:~]
[root@vesxvdi01:~] vmkload_mod -s lpfc | grep Version
Version: 10.4.236.0-1OEM.600.0.0.2159203
[root@vesxvdi01:~]
vmkernel.log
2016-04-05T14:27:05.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T2:L256 device naa.2ff70002ac014e9d - triggering path failover
2016-04-05T14:27:05.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...
2016-04-05T14:27:06.577Z cpu9:57333)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400
2016-04-05T14:27:06.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T3:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.
2016-04-05T14:27:06.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T3:L256 device naa.2ff70002ac014e9d - triggering path failover
2016-04-05T14:27:06.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...
2016-04-05T14:27:07.577Z cpu12:33266)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400
2016-04-05T14:27:07.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T2:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.
2016-04-05T14:27:07.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T2:L256 device naa.2ff70002ac014e9d - triggering path failover
2016-04-05T14:27:07.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...
2016-04-05T14:27:07.802Z cpu0:33356)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x2a (0x43a5c0694e40, 53705) to dev "naa.600508b4000af00d0000500001bc0000" on path "vmhba37:C0:T0:L1" Failed: H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:NONE
2016-04-05T14:27:07.802Z cpu0:33356)ScsiDeviceIO: 2613: Cmd(0x43a5c0694e40) 0x2a, CmdSN 0x80000070 from world 53705 to dev "naa.600508b4000af00d0000500001bc0000" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
2016-04-05T14:27:08.577Z cpu9:57333)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400
2016-04-05T14:27:08.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T3:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.
2016-04-05T14:27:08.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T3:L256 device naa.2ff70002ac014e9d - triggering path failover
2016-04-05T14:27:08.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...
2016-04-05T14:27:09.577Z cpu9:33266)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400
2016-04-05T14:27:09.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T2:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.
2016-04-05T14:27:09.577Z cpu6:33386)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x28) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T2:L256 device naa.2ff70002ac014e9d - triggering path failover
2016-04-05T14:27:09.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.2ff70002ac014e9d": awaiting fast path state update before retrying failed command again...
2016-04-05T14:27:10.577Z cpu9:57333)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.2ff70002ac014e9d" - issuing command 0x43a5cc84e400
2016-04-05T14:27:10.577Z cpu6:33386)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43a5cc84e400) to dev "naa.2ff70002ac014e9d" failed on path "vmhba2:C0:T3:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.
Hi guys
Finally, there is a patch for the 3PAR arrays available which fix this issue by answering the SCSI requests. There are two patches, one for 3.2.1 P37 and one for 3.2.2 P24.
They work correctly.
Yes, this is known issue. Logs showing sense code : failed on path "vmhba2:C0:T2:L256" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0. which means LUN is no longer available or is unmapped.
Fail over is triggering but not find next available path.
This table outlines possible SCSI sense codes that determine if a device is in a PDL state:
SCSI sense code | Description |
H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0 | LOGICAL UNIT NOT SUPPORTED |
H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x4c 0x0 | LOGICAL UNIT FAILED SELF-CONFIGURATION |
H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x3e 0x3 | LOGICAL UNIT FAILED SELF-TEST |
H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x3e 0x1 | LOGICAL UNIT FAILURE |
VMware KB: Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
VMware KB: SCSI events that can trigger ESX server to fail a LUN over to another path
You need to request hot-fix from Vmware, these are not public and available on request. Before that that validate multipathing is correct.
Hi!
Did you get this sorted out? We upgraded our test/dev enviroment and we ran into the exact same issue.
Regards
Kenth
Experiencing a similar thing on a newly installed host. Any resolve?
2016-04-27T18:49:03.902Z cpu48:33639)WARNING: NMP: nmp_PathDetermineFailure:2973: Cmd (0x85) PDL error (0x5/0x25/0x0) - path vmhba2:C0:T6:L14 device naa.514f0c535640000f - triggering path failover
2016-04-27T18:49:03.902Z cpu48:33639)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.514f0c535640000f": awaiting fast path state update before retrying failed command again...
2016-04-27T18:49:04.860Z cpu32:33475)<6>host14: fip: host14: FIP VLAN ID unavail. Retry VLAN discovery.
2016-04-27T18:49:04.860Z cpu32:33475)<6>host14: fip: fcoe_ctlr_vlan_request() is done
2016-04-27T18:49:04.890Z cpu10:33481)<6>host15: fip: host15: FIP VLAN ID unavail. Retry VLAN discovery.
2016-04-27T18:49:04.890Z cpu10:33481)<6>host15: fip: fcoe_ctlr_vlan_request() is done
2016-04-27T18:49:04.902Z cpu43:55091)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.514f0c535640000f" - issuing command 0x439e538367c0
2016-04-27T18:49:04.902Z cpu48:33639)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x85 (0x439e538367c0) to dev "naa.514f0c535640000f" failed on path "vmhba2:C0:T6:L14" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.
I found this kb: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=21332...
If I stop smartd, the log entries do stop, but still have this in the vmkernel.log
2016-04-27T18:57:43.407Z cpu18:33464)<6>host15: fip: host15: FIP VLAN ID unavail. Retry VLAN discovery.
2016-04-27T18:57:43.407Z cpu18:33464)<6>host15: fip: fcoe_ctlr_vlan_request() is done
2016-04-27T18:57:45.371Z cpu38:33489)<6>host14: fip: host14: FIP VLAN ID unavail. Retry VLAN discovery.
2016-04-27T18:57:45.371Z cpu38:33489)<6>host14: fip: fcoe_ctlr_vlan_request() is done
Are you running a 3PAR array as well? ..
I tried to stop the smart daemon as well but that didn't help us, took forever to do a storage refresh on the ESX even with the smartd stopped.
We got this issue sorted out with this KB :
VMware KB: Changing the Disk.MaxLUN parameter on ESXi Hosts
I turned it down to 180. (depends on how many LUNs you got in your environment), this will make the ESX only see the LUNids below 180 in my case, that means that the vVol PE (LUNid 256) is no longer present in the ESX. If you try it, don't forget to reboot the host after you change the value.
Regards Kenth
No 3PAR, but all FC storage with EMC XtremIO, EMC VNX, and NetApp.
Did you try to change the Disk.MaxLun value to a value below 256? .. We tried 180 and that did the trick!
KB
Ah okay. Since the vVol PE on a 3PAR is ID 256, i could use the MaxDisk value to make it disappear.
The lun you seem to have a problem with is ID 14 so the workaround I used is probably not the way to go for you.
Hi guys
Meanwhile I have two cases open with HPE and VMware regarding this issue. The only workaround is to limit the Disk.MaxLUN=256 to sort out the PE device. But unfortunately you can not use VVOLs at this point. Disabling the smartd service doesn't have an impact on this. I let you know if there's a solution for that...
Hi
I have also see this under upgrade from 5.5 to 6.0 update 2, I had to change the Disk.MaxLun to 255 before upgrading, if it's not set the upgrades stops at the start, after loading all the modules.
This is also HPE 3PAR
Hi guys
Finally, there is a patch for the 3PAR arrays available which fix this issue by answering the SCSI requests. There are two patches, one for 3.2.1 P37 and one for 3.2.2 P24.
They work correctly.
@EagleB5 - any idea what MU level for P37? Not able to find anything on it...
Sorry for my delay, I must configure a notification somehow.
You need 3.2.1 MU3 for P37 or 3.2.2 MU2 for P24.