Has anyone seen this, its the first time I've ran into an issue like this, and it showed up out of nowhere and has been getting worse over the last 3 weeks. I looked up the sense data and I'll include the error I see in the logs below. Basically, some vms are getting random disk errors, we were finally able to reproduce it and while watching esxtop the vm io stats go to zero and about 2 minutes later this error message shows in the vmkernal log. We have a support case open , but wanted to get a second opinion since they aren't moving all that fast. It seems if I move the vm to two newly created datastores this doesn't happen, these all are on the same netapp array so I'm not sure why the new datastore works better. If there is a reference to better understand these errors I'd appreciate it.
Type | Code | Name | Description |
Host Status | [0x8] | RESET | This status is returned when the HBA driver has aborted the I/O. It can also occur if the HBA does a reset of the target. |
Device Status | [0x0] | GOOD | This status is returned when there is no error from the device or target side. This is when you will see if there is a status for Host or Plugin. |
Plugin Status | [0x0] | GOOD | No error. (ESXi 5.x / 6.x only) |
Sense Key | [0x0] | NO SENSE | |
Additional Sense Data | 00/00 | NO ADDITIONAL SENSE INFORMATION |
2019-07-12T15:34:53.926Z cpu20:13707492)PVSCSI: 2642: scsi0:0: ABORT ctx=0xf3
2019-07-12T15:34:53.926Z cpu20:13707492)<7>fnic : 3 :: Abort Cmd called Cmd=0x0x4395574ed240 CmdSn=0x3521b6614 FCID 0x3a0081, LUN 0xd TAG e7 Op=0x2a flags 3
2019-07-12T15:34:53.926Z cpu20:13707492)<6>fnic : 3 :: CBD Opcode: 2a Abort issued time: 180880 msec
2019-07-12T15:34:53.926Z cpu23:7830720)<6>fnic : 3 :: icmnd_cmpl abts pending hdr status = FCPIO_ABORTED tag = 0xe7 sc = 0x0x439550eb0880scsi_status = 0 residual = 0
2019-07-12T15:34:53.926Z cpu23:7830720)<6>fnic : 3 :: abort reject recd. id 231
2019-07-12T15:34:53.926Z cpu23:7830720)<7>fnic : 3 :: abts cmpl recd. id 231 status FCPIO_ITMF_REJECTED
2019-07-12T15:34:53.926Z cpu20:13707492)<7>fnic : 3 :: Returning from abort cmd type 2 FAILED
2019-07-12T15:34:53.926Z cpu20:13707492)WARNING: LinScsi: SCSILinuxAbortCommands:1909: Failed, Driver fnic, for vmhba3
2019-07-12T15:34:55.968Z cpu1:65940)<7>fnic : 3 :: Abort Cmd called Cmd=0x0x4395574ed240 CmdSn=0x3521b6614 FCID 0x3a0081, LUN 0xd TAG e7 Op=0x2a flags 277
2019-07-12T15:34:55.968Z cpu1:65940)<6>fnic : 3 :: CBD Opcode: 2a Abort issued time: 182930 msec
2019-07-12T15:34:55.968Z cpu23:9574089)<7>fnic : 3 :: abts cmpl recd. id 231 status FCPIO_IO_NOT_FOUND
2019-07-12T15:34:55.968Z cpu1:65940)<7>fnic : 3 :: Returning from abort cmd type 2 SUCCESS
2019-07-12T15:34:55.968Z cpu25:168130)NMP: nmp_ThrottleLogForDevice:3616: Cmd 0x2a (0x4395574ed240, 13707488) to dev "naa.600a098038303835593f4a566b78526f" on path "vmhba3:C0:T2:L13" Failed: H:0x8 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:EVAL
2019-07-12T15:34:55.968Z cpu25:168130)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.600a098038303835593f4a566b78526f" state in doubt; requested fast path state update...
2019-07-12T15:34:55.969Z cpu25:168130)ScsiDeviceIO: 2965: Cmd(0x4395574ed240) 0x2a, CmdSN 0xf3 from world 13707488 to dev "naa.600a098038303835593f4a566b78526f" failed H:0x8 D:0x0 P:0x0
2019-07-12T15:34:55.969Z cpu25:168130)PVSCSI: 1640: ctx=0xf3 cdb0=0x2a on scsi0:0 completed during reset (0x8)
Turned out to be a bad sfp. the optical signal that the storage was receiving was out of normal operating thresholds
---------------------------------------------------------------------------------------------------------
Was it helpful? Let us know by completing this short survey here.
I also found a few of these errors as well
https://cormachogan.com/2017/08/24/ats-miscompare-revisited-vsphere-6-5/
2019-07-12T16:03:12.049Z cpu16:11788692)NMP: nmp_ThrottleLogForDevice:3616: Cmd 0x89 (0x4395564082c0, 67734) to dev "naa.600a098038303841452b4a5556764177" on path "vmhba3:C0:T0:L102" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0. Act:NONE
2019-07-12T16:03:12.049Z cpu16:11788692)ScsiDeviceIO: 2980: Cmd(0x439541ac7540) 0xfe, CmdSN 0x18bfcb9 from world 67734 to dev "naa.600a098038303841452b4a5556764177" failed H:0x0 D:0x2 P:0x5 Invalid sense data: 0x80 0x41 0x0.
Turned out to be a bad sfp. the optical signal that the storage was receiving was out of normal operating thresholds
---------------------------------------------------------------------------------------------------------
Was it helpful? Let us know by completing this short survey here.