VMware Cloud Community
roeland1
Contributor
Contributor

Intel nvme kernel panic esxi 7.0.3

Hi.

we are having some issue after upgrading our esxi nodes to 7.0.3

Using Intel nvme SSD disk we had no issues before ( version 7.0.2 with driver version 1.8.0) but after the upgrade we had a few kernel panics. (see logs attatched)

At first we used intel-nvme-vmd driver version 2.6.0 but that gave frequent issues, after downgrade to version 2.5.0 is seems to be a bit more stable, but I still have a lot of message in vmkernel.log

On one node I see these entries

 

 

2024-03-12T10:12:49.066Z cpu21:2097233)ScsiDeviceIO: 4176: Cmd(0x45d910ebf008) 0x42, CmdSN 0xfb66 from world 2103679 to dev "t10.NVMe____INTEL_SSDPE2KX040T8_____________________PHLJ948005Z74P0DGN__00000001" failed H:0x3 D:0x0 P:0x0
2024-03-12T10:12:49.066Z cpu21:2097233)ScsiDeviceIO: 4176: Cmd(0x45d9117d3dc8) 0x42, CmdSN 0xfb65 from world 2103679 to dev "t10.NVMe____INTEL_SSDPE2KX040T8_____________________PHLJ948005Z74P0DGN__00000001" failed H:0x3 D:0x0 P:0x0
2024-03-12T10:12:49.962Z cpu26:2099884 opID=26017416)World: 12077: VC opID lto5htoc-1337-auto-116-h5:70000574-6d-01-01-90-caef maps to vmkernel opID 26017416
2024-03-12T10:12:51.073Z cpu37:2097243)ScsiDeviceIO: 4163: Cmd(0x45d910edb208) 0x42, cmdId.initiator=0x4306b73a79c0 CmdSN 0xfb6a from world 2103745 to dev "t10.NVMe____INTEL_SSDPE2KX040T8_____________________PHLJ948005Z74P0DGN__00000001" failed
2024-03-12T10:12:51.073Z cpu37:2097243)H:0x5 D:0x0 P:0x0 . Cmd count Active:3 Queued:0
2024-03-12T10:12:51.154Z cpu36:2097232)ScsiDeviceIO: 4124: Cmd(0x45d90fe38608) 0x42, CmdSN 0xfc6d from world 2101802 to dev "t10.NVMe____INTEL_SSDPE2KX040T8_____________________PHLJ948005Z74P0DGN__00000001" failed H:0xc D:0x0 P:0x0

 

 

 

On another I see the following

 

 

2024-03-12T10:11:03.118Z cpu28:2101810)VNVME: 350: Error status: Not supported converted to: 0x80:0x1
2024-03-12T10:11:36.104Z cpu5:2101319)VNVME: 350: Error status: Not supported converted to: 0x80:0x1
2024-03-12T10:11:48.743Z cpu30:2101810)VNVME: 350: Error status: Not supported converted to: 0x80:0x1
2024-03-12T10:11:58.760Z cpu36:2101810)VNVME: 350: Error status: Not supported converted to: 0x80:0x1
2024-03-12T10:12:36.103Z cpu16:2101319)VNVME: 350: Error status: Not supported converted to: 0x80:0x1
2024-03-12T10:12:53.743Z cpu38:2101810)VNVME: 350: Error status: Not supported converted to: 0x80:0x1
2024-03-12T10:13:34.521Z cpu5:2101319)VNVME: 350: Error status: Not supported converted to: 0x80:0x1

 

In between these messages A LOT of this 

2024-03-12T10:54:44.109Z cpu23:2097197)nvme_ScsiCommand Failed Dsm Request
2024-03-12T10:54:44.109Z cpu23:2097197)nvme_ScsiCommand Failed Dsm Request
2024-03-12T10:54:44.109Z cpu23:2097197)nvme_ScsiCommand Failed Dsm Request
2024-03-12T10:54:44.109Z cpu23:2097197)nvme_ScsiCommand Failed Dsm Request
2024-03-12T10:54:44.109Z cpu23:2097197)nvme_ScsiCommand Failed Dsm Request

Can anybody help me figure out what the issue might be and how to resolve this?

Many thanks in advance.

Tags (3)
0 Kudos
0 Replies