cynar
Enthusiast
Enthusiast

nvme timeouts on very fast SSD (Windows Workstation 17.0.2, guest Linux)

I very very regularly have nvme timeout on me in a guest, for instance with

[ 14.310896] NET: Registered PF_QIPCRTR protocol family
[ 40.388311] nvme nvme0: I/O 128 QID 4 timeout, completion polled
[ 40.388343] nvme nvme0: I/O 96 QID 5 timeout, completion polled
[ 40.388400] nvme nvme0: I/O 32 QID 15 timeout, completion polled
[ 70.595937] nvme nvme0: I/O 96 QID 1 timeout, completion polled
[ 70.595969] nvme nvme0: I/O 129 QID 4 timeout, completion polled
[ 70.904742] vmxnet3 0000:03:00.0 ens160: intr type 3, mode 0, 9 vectors allocated

(from `dmesg`; when reproducing, simply `dmesg --level warn --follow`

This adds a delay of 30 seconds to file system operations.

This seems to be new ever since I attempted to maximize I/O performance, by

  • using the VMware NVMe virtual controller
  • with independent disks
  • and having that on a very fast SSD 

This is somewhat reproducible for me in two scenarios

  • boot Linux guest (see above for example)
  • run the very fast ripgrep (`rg`) on the full depth of the file system (e.g. `cd / ; rg x > /dev/null`, to hit plenty of "things" for heavy I/O

Each of these timeouts means waiting for 30 seconds. That is not at all good.

The host operating system at no time issues any warning or errors.

Hardware (host):

  • Dell Inspiron 7610 notebook == Tigerlake 8 core / 16 execution units
  • 64 GB of memory
  • moved OEM SSD to secondary PCI 3.0 slot
  • installed 2 TB WD SN850X PCI 4.0 SSD into primary PCI 4.0 slot - this combo of SSD and PCI slot is about the fastest you can get in a simple laptop

Software (host):

  • Windows 11 Pro (fully up-to-date)
  • VMware Workstation 17.0.2

Hardware (guest):

  • 32 GB of memory
  • 16 cores
  • independent NVMe pointing to the fast WD SN850X disk

Software (guest)

  • Fedora Linux 38 (== 6.2.13-300.fc38.x86_64, but this happened with earlier kernels, too)
  • ... anything that puts load onto the I/O subsystem, e.g. starting the KDE desktop, running ripgrep ...

How can I fix the timeouts?

Reply
0 Kudos