Solved: random vm storage issues (FCPIO_ITMF_REJECTED)..

sjesse · ‎07-15-2019

Has anyone seen this, its the first time I've ran into an issue like this, and it showed up out of nowhere and has been getting worse over the last 3 weeks. I looked up the sense data and I'll include the error I see in the logs below. Basically, some vms are getting random disk errors, we were finally able to reproduce it and while watching esxtop the vm io stats go to zero and about 2 minutes later this error message shows in the vmkernal log. We have a support case open , but wanted to get a second opinion since they aren't moving all that fast. It seems if I move the vm to two newly created datastores this doesn't happen, these all are on the same netapp array so I'm not sure why the new datastore works better. If there is a reference to better understand these errors I'd appreciate it.

Type	Code	Name	Description
Host Status	[0x8]	RESET	This status is returned when the HBA driver has aborted the I/O. It can also occur if the HBA does a reset of the target.
Device Status	[0x0]	GOOD	This status is returned when there is no error from the device or target side. This is when you will see if there is a status for Host or Plugin.
Plugin Status	[0x0]	GOOD	No error. (ESXi 5.x / 6.x only)
Sense Key	[0x0]	NO SENSE
Additional Sense Data	00/00	NO ADDITIONAL SENSE INFORMATION

2019-07-12T15:34:53.926Z cpu20:13707492)PVSCSI: 2642: scsi0:0: ABORT ctx=0xf3

2019-07-12T15:34:53.926Z cpu20:13707492)<7>fnic : 3 :: Abort Cmd called Cmd=0x0x4395574ed240 CmdSn=0x3521b6614 FCID 0x3a0081, LUN 0xd TAG e7 Op=0x2a flags 3

2019-07-12T15:34:53.926Z cpu20:13707492)<6>fnic : 3 :: CBD Opcode: 2a Abort issued time: 180880 msec

2019-07-12T15:34:53.926Z cpu23:7830720)<6>fnic : 3 :: icmnd_cmpl abts pending hdr status = FCPIO_ABORTED tag = 0xe7 sc = 0x0x439550eb0880scsi_status = 0 residual = 0

2019-07-12T15:34:53.926Z cpu23:7830720)<6>fnic : 3 :: abort reject recd. id 231

2019-07-12T15:34:53.926Z cpu23:7830720)<7>fnic : 3 :: abts cmpl recd. id 231 status FCPIO_ITMF_REJECTED

2019-07-12T15:34:53.926Z cpu20:13707492)<7>fnic : 3 :: Returning from abort cmd type 2 FAILED

2019-07-12T15:34:53.926Z cpu20:13707492)WARNING: LinScsi: SCSILinuxAbortCommands:1909: Failed, Driver fnic, for vmhba3

2019-07-12T15:34:55.968Z cpu1:65940)<7>fnic : 3 :: Abort Cmd called Cmd=0x0x4395574ed240 CmdSn=0x3521b6614 FCID 0x3a0081, LUN 0xd TAG e7 Op=0x2a flags 277

2019-07-12T15:34:55.968Z cpu1:65940)<6>fnic : 3 :: CBD Opcode: 2a Abort issued time: 182930 msec

2019-07-12T15:34:55.968Z cpu23:9574089)<7>fnic : 3 :: abts cmpl recd. id 231 status FCPIO_IO_NOT_FOUND

2019-07-12T15:34:55.968Z cpu1:65940)<7>fnic : 3 :: Returning from abort cmd type 2 SUCCESS

2019-07-12T15:34:55.968Z cpu25:168130)NMP: nmp_ThrottleLogForDevice:3616: Cmd 0x2a (0x4395574ed240, 13707488) to dev "naa.600a098038303835593f4a566b78526f" on path "vmhba3:C0:T2:L13" Failed: H:0x8 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:EVAL

2019-07-12T15:34:55.968Z cpu25:168130)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.600a098038303835593f4a566b78526f" state in doubt; requested fast path state update...

2019-07-12T15:34:55.969Z cpu25:168130)ScsiDeviceIO: 2965: Cmd(0x4395574ed240) 0x2a, CmdSN 0xf3 from world 13707488 to dev "naa.600a098038303835593f4a566b78526f" failed H:0x8 D:0x0 P:0x0

2019-07-12T15:34:55.969Z cpu25:168130)PVSCSI: 1640: ctx=0xf3 cdb0=0x2a on scsi0:0 completed during reset (0x8)

sjesse · ‎07-18-2019

Turned out to be a bad sfp. the optical signal that the storage was receiving was out of normal operating thresholds

---------------------------------------------------------------------------------------------------------

Was it helpful? Let us know by completing this short survey here.

View solution in original post

sjesse · ‎07-15-2019

I also found a few of these errors as well

https://cormachogan.com/2017/08/24/ats-miscompare-revisited-vsphere-6-5/

2019-07-12T16:03:12.049Z cpu16:11788692)NMP: nmp_ThrottleLogForDevice:3616: Cmd 0x89 (0x4395564082c0, 67734) to dev "naa.600a098038303841452b4a5556764177" on path "vmhba3:C0:T0:L102" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0. Act:NONE

2019-07-12T16:03:12.049Z cpu16:11788692)ScsiDeviceIO: 2980: Cmd(0x439541ac7540) 0xfe, CmdSN 0x18bfcb9 from world 67734 to dev "naa.600a098038303841452b4a5556764177" failed H:0x0 D:0x2 P:0x5 Invalid sense data: 0x80 0x41 0x0.

sjesse · ‎07-18-2019

Turned out to be a bad sfp. the optical signal that the storage was receiving was out of normal operating thresholds

---------------------------------------------------------------------------------------------------------

Was it helpful? Let us know by completing this short survey here.

All

random vm storage issues (FCPIO_ITMF_REJECTED)..