nickbroon
Contributor
Contributor

ESXi Kernel Error: 'PVSCSI: 956: PROCESS: failed to pin: I/O error'

I have a VM (running debian bullseye) that is reporting errors like these, and will eventually fail and remounts it's rootfs read-only:

$ sudo journalctl -b |grep "Unknown completion status"
Aug 01 13:26:35 vm3 kernel: sd 0:0:0:0: [sda] tag#749 Unknown completion status: 0x1a
Aug 01 13:26:35 vm3 kernel: sd 0:0:0:0: [sda] tag#897 Unknown completion status: 0x1a
Aug 01 13:26:48 vm3 kernel: sd 0:0:0:0: [sda] tag#414 Unknown completion status: 0x1a
Aug 01 13:26:48 vm3 kernel: sd 0:0:0:0: [sda] tag#346 Unknown completion status: 0x1a
Aug 01 13:26:49 vm3 kernel: sd 0:0:0:0: [sda] tag#723 Unknown completion status: 0x1a
Aug 01 13:26:49 vm3 kernel: sd 0:0:0:0: [sda] tag#750 Unknown completion status: 0x1a

0x1a in the linux vmw_pvsci driver is: BTSTAT_INVPARAM
https://sources.debian.org/src/linux/5.10.127-2/drivers/scsi/vmw_pvscsi.h/?hl=50#L52
 
Looking at my ESXi 7 hypervisor kernel logs for the corresponding time period I see these errors:

[root@esxi1:/var/log] grep "2022-08-01T12:26:48" *.log
vmkernel.log:2022-08-01T12:26:48.640Z cpu1:2101701)VSCSI: 481: Trying to DMA to 0x1ffffffff000:4096
vmkernel.log:2022-08-01T12:26:48.640Z cpu1:2101701)PVSCSI: 956: PROCESS: failed to pin: I/O error
vmkernel.log:2022-08-01T12:26:48.674Z cpu1:2101701)VSCSI: 481: Trying to DMA to 0x1ffffffff000:4096
vmkernel.log:2022-08-01T12:26:48.674Z cpu1:2101701)PVSCSI: 956: PROCESS: failed to pin: I/O error

 I'm not sure specifically how these errors all relate, and what further steps I can take to root cause and fix this. Any suggestions or pointers are welcome?

A few details of the storage on my hypervisor device:

[root@esxi1:~] esxcfg-scsidevs -c
Device UID Device Type Console Device Size Multipath PluginDisplay Name
mpx.vmhba32:C0:T0:L0 CD-ROM /vmfs/devices/cdrom/mpx.vmhba32:C0:T0:L0 395MB NMP Local USB CD-ROM (mpx.vmhba32:C0:T0:L0)
naa.5003048029e8793d Enclosure Svc Dev/vmfs/devices/genscsi/naa.5003048029e8793d 0MB HPP Local SMC Enclosure Svc Dev (naa.5003048029e8793d)
naa.600508e000000000530e3c89f101f602 Direct-Access /vmfs/devices/disks/naa.600508e000000000530e3c89f101f602 1830101MB HPP Local LSI Disk (naa.600508e000000000530e3c89f101f602)
naa.600508e000000000841a22e7baeef900 Direct-Access /vmfs/devices/disks/naa.600508e000000000841a22e7baeef900 914573MB HPP Local LSI Disk (naa.600508e000000000841a22e7baeef900)
[root@esxi1:~] esxcfg-scsidevs -a
vmhba0 vmw_ahci link-n/a sata.vmhba0 (0000:00:11.5) Intel Corporation Lewisburg SATA AHCI Controller
vmhba1 vmw_ahci link-n/a sata.vmhba1 (0000:00:17.0) Intel Corporation Lewisburg SATA AHCI Controller
vmhba2 lsi_msgpt3 link-n/a sas.5003048024be3002 (0000:18:00.0) Avago (LSI Logic) Fusion-MPT 12GSAS SAS3008 PCI-Express
vmhba32 vmkusb link-n/a usb.vmhba32 () USB
[root@esxi1:~] esxcfg-scsidevs -m
naa.600508e000000000841a22e7baeef900:8 /vmfs/devices/disks/naa.600508e000000000841a22e7baeef900:8 62b0e6e9-eeebaaac-19a3-3cecef0ec72e 0 datastore1
naa.600508e000000000530e3c89f101f602:3 /vmfs/devices/disks/naa.600508e000000000530e3c89f101f602:3 62dac754-2f675d66-a3f0-3cecef0ec72e 0 datastore2
naa.600508e000000000841a22e7baeef900:7 /vmfs/devices/disks/naa.600508e000000000841a22e7baeef900:7 62b0e6e9-dbba9ec8-b56a-3cecef0ec72e 0 OSDATA-62b0e6e9-dbba9ec8-b56a-3cecef0ec72e
[root@esxi1:~] esxcfg-scsidevs -A
vmhba2 naa.600508e000000000841a22e7baeef900
vmhba2 naa.600508e000000000530e3c89f101f602
vmhba2 naa.5003048029e8793d
vmhba32 mpx.vmhba32:C0:T0:L0

 

0 Kudos
2 Replies
vbondzio
VMware Employee
VMware Employee

Thanks for the nice write up / formatting!

Can you check whether you saw any errors in the vmware.log ? Also, is this VM running off a snapshot? Are you using any 3rd party IO filters? 0x1ffffffff000 is an odd address too ...

0 Kudos
nickbroon
Contributor
Contributor

Some of the related additional logs:

vmkwarning.log

2022-08-03T14:46:31.338Z cpu1:2098154)WARNING: lsi_msgpt3: _scsih_probe_raid:9949: Failed to rescan SCSI adapter Invalid, Invalid name.
2022-08-03T14:46:34.358Z cpu8:2097846)WARNING: FBFT not enabled
2022-08-03T14:46:35.231Z cpu48:2098223)WARNING: ScsiDeviceIO: 4067: ModeSense 0x1a request failed - blocking page:0x1c subpage:0x2 naa.60050
8e000000000530e3c89f101f602
2022-08-03T14:46:35.233Z cpu48:2098223)WARNING: ScsiDeviceIO: 4067: ModeSense 0x1a request failed - blocking page:0xa subpage:0x0 naa.500304
8029e8793d
2022-08-03T14:46:35.234Z cpu48:2098223)WARNING: ScsiDeviceIO: 4067: ModeSense 0x1a request failed - blocking page:0x1c subpage:0x2 naa.60050
8e000000000841a22e7baeef900

vmkernel.log

2022-08-04T16:35:37.407Z cpu12:2101523)PVSCSI: 956: PROCESS: failed to pin: I/O error
2022-08-04T16:37:09.377Z cpu22:2101523)PVSCSI: 956: PROCESS: failed to pin: I/O error
2022-08-04T16:39:17.231Z cpu2:2101523)PVSCSI: 956: PROCESS: failed to pin: I/O error
2022-08-04T16:39:24.173Z cpu28:2101523)PVSCSI: 956: PROCESS: failed to pin: I/O error
2022-08-04T16:39:28.707Z cpu6:2101523)PVSCSI: 956: PROCESS: failed to pin: I/O error

vmware.log

2022-08-03T14:51:15.045Z In(05) vcpu-12 - CPU reset: hard (mode Emulation)

And from the VM linux kernel:

Aug 04 17:39:17 uk-gitlab-runner-3 kernel: sd 0:0:0:0: [sda] tag#79 Unknown completion status: 0x1a
Aug 04 17:39:24 uk-gitlab-runner-3 kernel: sd 0:0:0:0: [sda] tag#872 Unknown completion status: 0x1a
Aug 04 17:39:28 uk-gitlab-runner-3 kernel: sd 0:0:0:0: [sda] tag#655 Unknown completion status: 0x1a


And details of the storage adapter driver (which I've confirmed is the most recent), are showing that it has "Failed Commands: 23" counters that's increasing.

$ localcli storage core adapter list

HBA Name Driver Link State UID Capabilities Description
-------- ---------- ---------- -------------------- ------------ -----------
vmhba0 vmw_ahci link-n/a sata.vmhba0 (0000:00:11.5) Intel Corporation Lewisburg SATA AHCI Controller
vmhba1 vmw_ahci link-n/a sata.vmhba1 (0000:00:17.0) Intel Corporation Lewisburg SATA AHCI Controller
vmhba2 lsi_msgpt3 link-n/a sas.5003048024be3002 (0000:18:00.0) Avago (LSI Logic) Fusion-MPT 12GSAS SAS3008 PCI-Express

$ localcli software vib list |grep -i lsi

lsi-mr3 7.718.02.00-1vmw.703.0.20.19193900 VMW VMwareCertified 2022-06-20
lsi-msgpt2 20.00.06.00-4vmw.703.0.20.19193900 VMW VMwareCertified 2022-06-20
lsi-msgpt35 19.00.02.00-1vmw.703.0.20.19193900 VMW VMwareCertified 2022-06-20
lsi-msgpt3 17.00.12.00-1vmw.703.0.20.19193900 VMW VMwareCertified 2022-06-20
lsuv2-lsiv2-drivers-plugin 1.0.0-12vmw.703.0.50.20036589 VMware VMwareCertified 2022-07-29

$ localcli storage core adapter stats get

vmhba2:

Successful Commands: 97856910
Blocks Read: 194183762
Blocks Written: 15161559415
Read Operations: 2687352
Write Operations: 95066120
Reserve Operations: 1184
Reservation Conflicts: 0
Failed Commands: 23
Failed Blocks Read: 0
Failed Blocks Written: 0
Failed Read Operations: 0
Failed Write Operations: 0
Failed Reserve Operations: 0
Total Splits: 0
PAE Commands: 0

 

 

0 Kudos