VMware Communities
cynar
Enthusiast
Enthusiast

nvme timeouts on very fast SSD (Windows Workstation 17.0.2, guest Linux)

I very very regularly have nvme timeout on me in a guest, for instance with

[ 14.310896] NET: Registered PF_QIPCRTR protocol family
[ 40.388311] nvme nvme0: I/O 128 QID 4 timeout, completion polled
[ 40.388343] nvme nvme0: I/O 96 QID 5 timeout, completion polled
[ 40.388400] nvme nvme0: I/O 32 QID 15 timeout, completion polled
[ 70.595937] nvme nvme0: I/O 96 QID 1 timeout, completion polled
[ 70.595969] nvme nvme0: I/O 129 QID 4 timeout, completion polled
[ 70.904742] vmxnet3 0000:03:00.0 ens160: intr type 3, mode 0, 9 vectors allocated

(from `dmesg`; when reproducing, simply `dmesg --level warn --follow`

This adds a delay of 30 seconds to file system operations.

This seems to be new ever since I attempted to maximize I/O performance, by

  • using the VMware NVMe virtual controller
  • with independent disks
  • and having that on a very fast SSD 

This is somewhat reproducible for me in two scenarios

  • boot Linux guest (see above for example)
  • run the very fast ripgrep (`rg`) on the full depth of the file system (e.g. `cd / ; rg x > /dev/null`, to hit plenty of "things" for heavy I/O

Each of these timeouts means waiting for 30 seconds. That is not at all good.

The host operating system at no time issues any warning or errors.

Hardware (host):

  • Dell Inspiron 7610 notebook == Tigerlake 8 core / 16 execution units
  • 64 GB of memory
  • moved OEM SSD to secondary PCI 3.0 slot
  • installed 2 TB WD SN850X PCI 4.0 SSD into primary PCI 4.0 slot - this combo of SSD and PCI slot is about the fastest you can get in a simple laptop

Software (host):

  • Windows 11 Pro (fully up-to-date)
  • VMware Workstation 17.0.2

Hardware (guest):

  • 32 GB of memory
  • 16 cores
  • independent NVMe pointing to the fast WD SN850X disk

Software (guest)

  • Fedora Linux 38 (== 6.2.13-300.fc38.x86_64, but this happened with earlier kernels, too)
  • ... anything that puts load onto the I/O subsystem, e.g. starting the KDE desktop, running ripgrep ...

How can I fix the timeouts?

Reply
0 Kudos
5 Replies
mdudnik
Contributor
Contributor

I have exactly the same issue where host  Windows 10, VMware® Workstation 17 Pro 17.0.2 build-21581411, and guest Centos 8

Reply
0 Kudos
cynar
Enthusiast
Enthusiast

I have now established a work-around on the VMware Workstation 17.0.2 (Windows 11) host and applied additional tuning to my existing Fedora Linux 38 guest installation:

a) enable Fedora 38 init ramdisk for booting not only nvme

cat << EOF > /etc/dracut.conf.d/scsi.conf
add_drivers+=" vmw_pvscsi mptbase mptscsih mptspi mptsas "
EOF

dracut -f -v

halt -p

b) switch controller for existing disk in VMware Workstation

- remove existing NMVe hard disk (this only unlinks it, the backing store remains, your data is safe)
- add new hard disk of type SCSI, pick the original backing store VMDK

- start VM

c) tune for access (optional)

add "noatime" and "ssd" to the btrfs mounts in /etc/fstab

reboot

d) validate

Have plenty of software and data on your virtual hard disk

cd /
rg 1 | wc -l

//exp: no system hangs, obviously no NVMe timeouts
//act: ... exactly as expected, returns a number in the millions

This works much much better than before. Incidentally, it seems as if a (software CI) task which previously took 55 seconds to complete now takes 5% less time (non-scientific measurement) - but really, the important part for me: no 30 second pausing / stalls due to NVMe controller timeouts.

Based on the above, I can only suggest to ignore the VMware recommendation to pick NVMe and to go for SCSI as a fast and robust choice.

cynar
Enthusiast
Enthusiast

FYI, VMware Workstation 17.5, with its new hardware version 21 and refreshed NVMe support, exhibits the same unwanted behaviour.

On Fedora 39 (beta) with "Linux fedora-gnome 6.5.6-300.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 6 19:57:21 UTC 2023 x86_64 GNU/Linux", the commands

cd /
rg 1 | wc -l

 yield plenty of timeouts - and with that comes massive delays. Run this after a fresh boot with very cold caches.

[ 182.238025] nvme nvme0: I/O 192 QID 5 timeout, completion polled
[ 222.686266] nvme nvme0: I/O 224 QID 6 timeout, completion polled
[ 252.894103] nvme nvme0: I/O 0 QID 3 timeout, completion polled
[ 283.614594] nvme nvme0: I/O 1 QID 3 timeout, completion polled
[ 283.614607] nvme nvme0: I/O 192 QID 8 timeout, completion polled
[ 283.614626] nvme nvme0: I/O 128 QID 11 timeout, completion polled
[ 314.589824] nvme nvme0: I/O 227 QID 6 timeout, completion polled
[ 344.990885] nvme nvme0: I/O 128 QID 1 timeout, completion polled
[ 345.024297] nvme nvme0: I/O 193 QID 8 timeout, completion polled
[ 376.286125] nvme nvme0: I/O 129 QID 11 timeout, completion polled
[ 432.094158] nvme nvme0: I/O 92 QID 4 timeout, completion polled
[ 432.094170] nvme nvme0: I/O 0 QID 6 timeout, completion polled
[ 462.302211] nvme nvme0: I/O 248 QID 3 timeout, completion polled
[ 462.302237] nvme nvme0: I/O 32 QID 6 timeout, completion polled
[ 462.302242] nvme nvme0: I/O 96 QID 7 timeout, completion polled
[ 462.302247] nvme nvme0: I/O 130 QID 11 timeout, completion polled
[ 493.021506] nvme nvme0: I/O 225 QID 3 timeout, completion polled
[ 526.301435] nvme nvme0: I/O 32 QID 5 timeout, completion polled

  

cynar
Enthusiast
Enthusiast

It would seem as if Linux kernel tuning on NVMe parameters may help alleviate the problem.

I had had the opportunity to set up another virtual machine on that specific WD_BLACK SN850X SSD - basically booting a fresh and _natively_ installed Fedora Linux 39 from its three physical partitions (EFI, boot, data) via VMware Workstation physical drive access, using the virtual NVMe controller.

Initially, this setup was also suffering from occasionally massively degraded performance (see above).

One kernel tuning parameters seems to be making The Difference:

nvme.poll_queues=64

Lets look at one of the results of the exploratory probing:

Fedora 38, many "20 GB split virtual disk" files, virtual SCSI controller
    read: IOPS=39.9k, BW=156MiB/s (164MB/s)(9361MiB/60004msec)

Fedora 39, "Physical Drive partitions", virtual NVMe controller (virtual hardware rev 21)
   read: IOPS=170k, BW=665MiB/s (697MB/s)(38.9GiB/60002msec)

The performance difference is substantial, all the while

dmesg --follow --level warn --time-format iso

does not show any of the NVMe timeout problems.

So, for the time being, I am running this virtualized physical Fedora Linux 39 with

sudo grubby --update-kernel=ALL --args="nvme.poll_queues=64"
sudo grubby --info=ALL

 

Random notes:

What a lovely rabbit hole to fall into ...

Nothing comes for free - NVMe polling consumes more CPU. Does it matter?

The optimal poll queue count is not known, and neither is clear whether split read and write poll queues are beneficial - see https://elixir.bootlin.com/linux/latest/source/drivers/nvme/host/pci.c (or rather the version applying to your Linux kernel) for All Of The Truth (because I was unable to find any useful documentation).

What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance St... is an interesting article explaining a great many detail about I/O performance in Linux.

And finally, for stress-testing and exploration, Benchmark persistent disk performance on a Linux VM  |  Compute Engine Documentation  |  Google Clou... is a useful resource with pre-cooked "fio" commands.

agostonbarna
Contributor
Contributor

Thanks for sharing your invaluable findings @cynar!
I hit the same nvme timeout issue on Fedora 38 guest, Windows 10 host, VMware Player 17.0, and as you also mentioned upgrading to Fedora 39 and VMware 17.5 didn't fix it, and I had no luck finding a solution anywhere else.
Tweaking the nvme.poll_queues kernel param (I went with 32 for now) seem to have completely eliminated the nvme timeouts (even under stress-testing with fio), and improved the performance.

Reply
0 Kudos