I very very regularly have nvme timeout on me in a guest, for instance with
[ 14.310896] NET: Registered PF_QIPCRTR protocol family
[ 40.388311] nvme nvme0: I/O 128 QID 4 timeout, completion polled
[ 40.388343] nvme nvme0: I/O 96 QID 5 timeout, completion polled
[ 40.388400] nvme nvme0: I/O 32 QID 15 timeout, completion polled
[ 70.595937] nvme nvme0: I/O 96 QID 1 timeout, completion polled
[ 70.595969] nvme nvme0: I/O 129 QID 4 timeout, completion polled
[ 70.904742] vmxnet3 0000:03:00.0 ens160: intr type 3, mode 0, 9 vectors allocated
(from `dmesg`; when reproducing, simply `dmesg --level warn --follow`
This adds a delay of 30 seconds to file system operations.
This seems to be new ever since I attempted to maximize I/O performance, by
This is somewhat reproducible for me in two scenarios
Each of these timeouts means waiting for 30 seconds. That is not at all good.
The host operating system at no time issues any warning or errors.
Hardware (host):
Software (host):
Hardware (guest):
Software (guest)
How can I fix the timeouts?
I have exactly the same issue where host Windows 10, VMware® Workstation 17 Pro 17.0.2 build-21581411, and guest Centos 8
I have now established a work-around on the VMware Workstation 17.0.2 (Windows 11) host and applied additional tuning to my existing Fedora Linux 38 guest installation:
a) enable Fedora 38 init ramdisk for booting not only nvme
cat << EOF > /etc/dracut.conf.d/scsi.conf
add_drivers+=" vmw_pvscsi mptbase mptscsih mptspi mptsas "
EOF
dracut -f -v
halt -p
b) switch controller for existing disk in VMware Workstation
- remove existing NMVe hard disk (this only unlinks it, the backing store remains, your data is safe)
- add new hard disk of type SCSI, pick the original backing store VMDK
- start VM
c) tune for access (optional)
add "noatime" and "ssd" to the btrfs mounts in /etc/fstab
reboot
d) validate
Have plenty of software and data on your virtual hard disk
cd /
rg 1 | wc -l
//exp: no system hangs, obviously no NVMe timeouts
//act: ... exactly as expected, returns a number in the millions
This works much much better than before. Incidentally, it seems as if a (software CI) task which previously took 55 seconds to complete now takes 5% less time (non-scientific measurement) - but really, the important part for me: no 30 second pausing / stalls due to NVMe controller timeouts.
Based on the above, I can only suggest to ignore the VMware recommendation to pick NVMe and to go for SCSI as a fast and robust choice.
FYI, VMware Workstation 17.5, with its new hardware version 21 and refreshed NVMe support, exhibits the same unwanted behaviour.
On Fedora 39 (beta) with "Linux fedora-gnome 6.5.6-300.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 6 19:57:21 UTC 2023 x86_64 GNU/Linux", the commands
cd /
rg 1 | wc -l
yield plenty of timeouts - and with that comes massive delays. Run this after a fresh boot with very cold caches.
[ 182.238025] nvme nvme0: I/O 192 QID 5 timeout, completion polled
[ 222.686266] nvme nvme0: I/O 224 QID 6 timeout, completion polled
[ 252.894103] nvme nvme0: I/O 0 QID 3 timeout, completion polled
[ 283.614594] nvme nvme0: I/O 1 QID 3 timeout, completion polled
[ 283.614607] nvme nvme0: I/O 192 QID 8 timeout, completion polled
[ 283.614626] nvme nvme0: I/O 128 QID 11 timeout, completion polled
[ 314.589824] nvme nvme0: I/O 227 QID 6 timeout, completion polled
[ 344.990885] nvme nvme0: I/O 128 QID 1 timeout, completion polled
[ 345.024297] nvme nvme0: I/O 193 QID 8 timeout, completion polled
[ 376.286125] nvme nvme0: I/O 129 QID 11 timeout, completion polled
[ 432.094158] nvme nvme0: I/O 92 QID 4 timeout, completion polled
[ 432.094170] nvme nvme0: I/O 0 QID 6 timeout, completion polled
[ 462.302211] nvme nvme0: I/O 248 QID 3 timeout, completion polled
[ 462.302237] nvme nvme0: I/O 32 QID 6 timeout, completion polled
[ 462.302242] nvme nvme0: I/O 96 QID 7 timeout, completion polled
[ 462.302247] nvme nvme0: I/O 130 QID 11 timeout, completion polled
[ 493.021506] nvme nvme0: I/O 225 QID 3 timeout, completion polled
[ 526.301435] nvme nvme0: I/O 32 QID 5 timeout, completion polled
It would seem as if Linux kernel tuning on NVMe parameters may help alleviate the problem.
I had had the opportunity to set up another virtual machine on that specific WD_BLACK SN850X SSD - basically booting a fresh and _natively_ installed Fedora Linux 39 from its three physical partitions (EFI, boot, data) via VMware Workstation physical drive access, using the virtual NVMe controller.
Initially, this setup was also suffering from occasionally massively degraded performance (see above).
One kernel tuning parameters seems to be making The Difference:
nvme.poll_queues=64
Lets look at one of the results of the exploratory probing:
Fedora 38, many "20 GB split virtual disk" files, virtual SCSI controller
read: IOPS=39.9k, BW=156MiB/s (164MB/s)(9361MiB/60004msec)
Fedora 39, "Physical Drive partitions", virtual NVMe controller (virtual hardware rev 21)
read: IOPS=170k, BW=665MiB/s (697MB/s)(38.9GiB/60002msec)
The performance difference is substantial, all the while
dmesg --follow --level warn --time-format iso
does not show any of the NVMe timeout problems.
So, for the time being, I am running this virtualized physical Fedora Linux 39 with
sudo grubby --update-kernel=ALL --args="nvme.poll_queues=64"
sudo grubby --info=ALL
Random notes:
What a lovely rabbit hole to fall into ...
Nothing comes for free - NVMe polling consumes more CPU. Does it matter?
The optimal poll queue count is not known, and neither is clear whether split read and write poll queues are beneficial - see https://elixir.bootlin.com/linux/latest/source/drivers/nvme/host/pci.c (or rather the version applying to your Linux kernel) for All Of The Truth (because I was unable to find any useful documentation).
What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance St... is an interesting article explaining a great many detail about I/O performance in Linux.
And finally, for stress-testing and exploration, Benchmark persistent disk performance on a Linux VM | Compute Engine Documentation | Google Clou... is a useful resource with pre-cooked "fio" commands.
Thanks for sharing your invaluable findings @cynar!
I hit the same nvme timeout issue on Fedora 38 guest, Windows 10 host, VMware Player 17.0, and as you also mentioned upgrading to Fedora 39 and VMware 17.5 didn't fix it, and I had no luck finding a solution anywhere else.
Tweaking the nvme.poll_queues kernel param (I went with 32 for now) seem to have completely eliminated the nvme timeouts (even under stress-testing with fio), and improved the performance.
