Failed: H:0x0 D:0x2 P:0x0 Valid sense data

gr99 · ‎02-23-2017

On my ESXi 5.5 U3 servers, I'm seeing regular messages in /var/log/vmkernel about failed valid sense data:

2017-02-23T14:08:16.307Z cpu2:34366)NMP: nmp_ThrottleLogForDevice:2458: Cmd 0x85 (0x412e86932dc0, 34608) to dev "naa.6c81f660d81ddf001a52e77d094d4447" on path "vmhba0:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2017-02-23T14:08:16.307Z cpu2:34366)ScsiDeviceIO: 2369: Cmd(0x412e86932dc0) 0x4d, CmdSN 0x105a from world 34608 to dev "naa.6c81f660d81ddf001a52e77d094d4447" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2017-02-23T14:08:16.307Z cpu2:34366)ScsiDeviceIO: 2369: Cmd(0x412e86932dc0) 0x1a, CmdSN 0x105b from world 34608 to dev "naa.6c81f660d81ddf001a52e77d094d4447" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

I thought this was due to the change in SCSI heartbeats (As per KB 2113956 ) I've applied the workaround:

# esxcli system settings advanced set -i 0 -o /VMFS3/UseATSForHBOnVMFS5

But I'm still seeing the errors. The disc is an onboard RAID card virtual drive:

# esxcli storage nmp device list

naa.6c81f660d81ddf001a52e77d094d4447

Device Display Name: Local DELL Disk (naa.6c81f660d81ddf001a52e77d094d4447)

Storage Array Type: VMW_SATP_LOCAL

Storage Array Type Device Config: SATP VMW_SATP_LOCAL does not support device configuration.

Path Selection Policy: VMW_PSP_FIXED

Path Selection Policy Device Config: {preferred=vmhba0:C2:T0:L0;current=vmhba0:C2:T0:L0}

Path Selection Policy Device Custom Config:

Working Paths: vmhba0:C2:T0:L0

Is Local SAS Device: false

Is USB: false

Is Boot USB Device: false

The hardware is a PowerEdge R720xd server with a PERC H710P Mini controller.

admin · ‎02-23-2017

It looks like harmless log spew related to rescan of local devices. Do the alerts come at regular intervals? 5 minutes, or 30 minutes?

Brief mention of this here: Filtering logs in VMware vSphere ESXi (2118562) | VMware KB

Example:

Some rescan commands for local storage devices will report a SCSI log expression in the vmkernel.log that can be safely filtered.

Below is an example of the logfilters file including these expressions:

RajeevVCP4 · ‎02-23-2017

Here I am assuming vmhba0:C2:T0:L0 is your local/boot storage ,

path "vmhba0:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2017-02-23T14:08:16.307Z cpu2:34366)ScsiDeviceIO: 2369: Cmd(0x412e86932dc0) 0x4d, CmdSN 0x105a from world 34608 to dev

This is not errors but information returned from the SAN /LUNs

Here is decoding of this code

Host Status [0x0] OK This status is returned when there is no error on the host side. This is when you will see if there is a status for a Device or Plugin. It is also when you will see Valid sense data instead of Possible sense Data.

Device Status [0x2] This status is returned when a command fails for a specific reason. When a CHECK CONDITION is received, the ESX storage stack will send out a SCSI command 0x3 (REQUEST SENSE) in order to get the SCSI sense data (Sense Key, Additional Sense Code, ASC Qualifier, and other bits). The sense data is listed after Valid sense data in the order of Sense Key, Additional Sense Code, and ASC Qualifier.

Plugin Status [0x0] GOOD No error. (ESXi 5.x / 6.x only)

Sense Key [0x5] ILLEGAL REQUEST

Additional Sense Data 20/00 INVALID COMMAND OPERATION CODE

For Cmd 0x4d scsi command 4d LOG SENSE (see https://en.wikipedia.org/wiki/SCSI_command )

Rajeev Chauhan
VCIX-DCV6.5/VSAN/VXRAIL
Please mark help full or correct if my answer is use full for you

sarikrizvi · ‎03-08-2018

Check below KBs to understand SCSI Check Conditions in VMkernel logs

VMware Knowledge Base

VMware Knowledge Base

Type	Code	Name	Description
Host Status	[0x0]	OK	This status is returned when there is no error on the host side. This is when you will see if there is a status for a Device or Plugin. It is also when you will see Valid sense data instead of Possible sense Data.
Device Status	[0x2]	CHECK_CONDITION	This status is returned when a command fails for a specific reason. When a CHECK CONDITION is received, the ESX storage stack will send out a SCSI command 0x3 (REQUEST SENSE) in order to get the SCSI sense data (Sense Key, Additional Sense Code, ASC Qualifier, and other bits). The sense data is listed after Valid sense data in the order of Sense Key, Additional Sense Code, and ASC Qualifier.
Plugin Status	[0x0]	GOOD	No error. (ESXi 5.x / 6.x only)
Sense Key	[0x5]	ILLEGAL REQUEST
Additional Sense Data	24/00	INVALID FIELD IN CDB

DougAWig · ‎12-07-2020

I have experienced this "H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0" error across many clusters over the last few years.
Typical scenario would be a host would become unresponsive, call VMware, they see these errors and say reboot host.

In my environment this was always issue with the qlnativefc driver. If you are not using qlnativefc, this likely does not apply to you. This bug stretches across multiple versions of this driver, effecting all versions of ESXi from 6.5 to 7, where RDPROTECT is enabled. This is the bit which tells if T10DIF is enabled and should return zero when a SAN does not support it. I've seen that 3PAR 7400's do not support T10DIF and I'm just now coming to realize that T10DIF seems to also not be supported on the HPE MSA 2050 platform. Because the driver returns a non-zero value, it occasionally triggers a lockup of the LUN in rare but repeatable circumstances.

Tried for over a year to get HPE to issue a fix but ultimately the solution was to decommission all storage arrays that do not support T10DIF and the lockup problem went away. It took a very long time to decommission so many arrays so we developed this procedure to work around the host disconnect issue to minimize the number of VM's that needed to be rebooted.

Host lockup occurs:
1. Check vRealize Log Insight under Interactive Analysis and filter for "failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0"
-alternatively your syslog server or if SSH still works you can grep the above string from vmkernel.log
2. extract the naa.xxx ID from step 1
3. ssh to any host in same cluster and issue command: "vmfs extent list|grep naa.xxx" where xxx= ID from step 2
4. You now have the datastore name that triggered the lockup on that host. From here you need to svmotion all VM's off that datastore.
5. Once all VM's are vacated from that datastore (except the ones on unresponsive host), un-present the LUN/datastore on the SAN level from the cluster.
6. If you wait a minute or five, you'll see your host is reconnected. If not you can manually reconnect.
7. vMotion all VM's off previously disconnected host and reboot
8. While host is rebooting, represent the LUN/Datastore to the cluster.
9. When host comes back, recover VM's that were on disconnected host and effected LUN/Datastore

This prevented us from having to reboot any VM's that were not on the effected datastore.
Hope this saves someone else a lot of time.

All

Failed: H:0x0 D:0x2 P:0x0 Valid sense data