Hi.
we are having some issue after upgrading our esxi nodes to 7.0.3
Using Intel nvme SSD disk we had no issues before ( version 7.0.2 with driver version 1.8.0) but after the upgrade we had a few kernel panics. (see logs attatched)
At first we used intel-nvme-vmd driver version 2.6.0 but that gave frequent issues, after downgrade to version 2.5.0 is seems to be a bit more stable, but I still have a lot of message in vmkernel.log
On one node I see these entries
2024-03-12T10:12:49.066Z cpu21:2097233)ScsiDeviceIO: 4176: Cmd(0x45d910ebf008) 0x42, CmdSN 0xfb66 from world 2103679 to dev "t10.NVMe____INTEL_SSDPE2KX040T8_____________________PHLJ948005Z74P0DGN__00000001" failed H:0x3 D:0x0 P:0x0
2024-03-12T10:12:49.066Z cpu21:2097233)ScsiDeviceIO: 4176: Cmd(0x45d9117d3dc8) 0x42, CmdSN 0xfb65 from world 2103679 to dev "t10.NVMe____INTEL_SSDPE2KX040T8_____________________PHLJ948005Z74P0DGN__00000001" failed H:0x3 D:0x0 P:0x0
2024-03-12T10:12:49.962Z cpu26:2099884 opID=26017416)World: 12077: VC opID lto5htoc-1337-auto-116-h5:70000574-6d-01-01-90-caef maps to vmkernel opID 26017416
2024-03-12T10:12:51.073Z cpu37:2097243)ScsiDeviceIO: 4163: Cmd(0x45d910edb208) 0x42, cmdId.initiator=0x4306b73a79c0 CmdSN 0xfb6a from world 2103745 to dev "t10.NVMe____INTEL_SSDPE2KX040T8_____________________PHLJ948005Z74P0DGN__00000001" failed
2024-03-12T10:12:51.073Z cpu37:2097243)H:0x5 D:0x0 P:0x0 . Cmd count Active:3 Queued:0
2024-03-12T10:12:51.154Z cpu36:2097232)ScsiDeviceIO: 4124: Cmd(0x45d90fe38608) 0x42, CmdSN 0xfc6d from world 2101802 to dev "t10.NVMe____INTEL_SSDPE2KX040T8_____________________PHLJ948005Z74P0DGN__00000001" failed H:0xc D:0x0 P:0x0
On another I see the following
2024-03-12T10:11:03.118Z cpu28:2101810)VNVME: 350: Error status: Not supported converted to: 0x80:0x1
2024-03-12T10:11:36.104Z cpu5:2101319)VNVME: 350: Error status: Not supported converted to: 0x80:0x1
2024-03-12T10:11:48.743Z cpu30:2101810)VNVME: 350: Error status: Not supported converted to: 0x80:0x1
2024-03-12T10:11:58.760Z cpu36:2101810)VNVME: 350: Error status: Not supported converted to: 0x80:0x1
2024-03-12T10:12:36.103Z cpu16:2101319)VNVME: 350: Error status: Not supported converted to: 0x80:0x1
2024-03-12T10:12:53.743Z cpu38:2101810)VNVME: 350: Error status: Not supported converted to: 0x80:0x1
2024-03-12T10:13:34.521Z cpu5:2101319)VNVME: 350: Error status: Not supported converted to: 0x80:0x1
In between these messages A LOT of this
2024-03-12T10:54:44.109Z cpu23:2097197)nvme_ScsiCommand Failed Dsm Request
2024-03-12T10:54:44.109Z cpu23:2097197)nvme_ScsiCommand Failed Dsm Request
2024-03-12T10:54:44.109Z cpu23:2097197)nvme_ScsiCommand Failed Dsm Request
2024-03-12T10:54:44.109Z cpu23:2097197)nvme_ScsiCommand Failed Dsm Request
2024-03-12T10:54:44.109Z cpu23:2097197)nvme_ScsiCommand Failed Dsm Request
Can anybody help me figure out what the issue might be and how to resolve this?
Many thanks in advance.