- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
nvme timeouts on very fast SSD (Windows Workstation 17.0.2, guest Linux)
I very very regularly have nvme timeout on me in a guest, for instance with
[ 14.310896] NET: Registered PF_QIPCRTR protocol family
[ 40.388311] nvme nvme0: I/O 128 QID 4 timeout, completion polled
[ 40.388343] nvme nvme0: I/O 96 QID 5 timeout, completion polled
[ 40.388400] nvme nvme0: I/O 32 QID 15 timeout, completion polled
[ 70.595937] nvme nvme0: I/O 96 QID 1 timeout, completion polled
[ 70.595969] nvme nvme0: I/O 129 QID 4 timeout, completion polled
[ 70.904742] vmxnet3 0000:03:00.0 ens160: intr type 3, mode 0, 9 vectors allocated
(from `dmesg`; when reproducing, simply `dmesg --level warn --follow`
This adds a delay of 30 seconds to file system operations.
This seems to be new ever since I attempted to maximize I/O performance, by
- using the VMware NVMe virtual controller
- with independent disks
- and having that on a very fast SSD
This is somewhat reproducible for me in two scenarios
- boot Linux guest (see above for example)
- run the very fast ripgrep (`rg`) on the full depth of the file system (e.g. `cd / ; rg x > /dev/null`, to hit plenty of "things" for heavy I/O
Each of these timeouts means waiting for 30 seconds. That is not at all good.
The host operating system at no time issues any warning or errors.
Hardware (host):
- Dell Inspiron 7610 notebook == Tigerlake 8 core / 16 execution units
- 64 GB of memory
- moved OEM SSD to secondary PCI 3.0 slot
- installed 2 TB WD SN850X PCI 4.0 SSD into primary PCI 4.0 slot - this combo of SSD and PCI slot is about the fastest you can get in a simple laptop
Software (host):
- Windows 11 Pro (fully up-to-date)
- VMware Workstation 17.0.2
Hardware (guest):
- 32 GB of memory
- 16 cores
- independent NVMe pointing to the fast WD SN850X disk
Software (guest)
- Fedora Linux 38 (== 6.2.13-300.fc38.x86_64, but this happened with earlier kernels, too)
- ... anything that puts load onto the I/O subsystem, e.g. starting the KDE desktop, running ripgrep ...
How can I fix the timeouts?