<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: nvme timeouts on very fast SSD (Windows Workstation 17.0.2, guest Linux) in VMware Workstation Pro Discussions</title>
    <link>https://communities.vmware.com/t5/VMware-Workstation-Pro/nvme-timeouts-on-very-fast-SSD-Windows-Workstation-17-0-2-guest/m-p/2994199#M183553</link>
    <description>&lt;P&gt;It would seem as if Linux kernel tuning on NVMe parameters may help alleviate the problem.&lt;/P&gt;&lt;P&gt;I had had the opportunity to set up another virtual machine on that specific WD_BLACK SN850X SSD - basically booting a fresh and _natively_ installed Fedora Linux 39 from its three physical partitions (EFI, boot, data) via VMware Workstation physical drive access, using the virtual NVMe controller.&lt;/P&gt;&lt;P&gt;Initially, this setup was also suffering from occasionally massively degraded performance (see above).&lt;/P&gt;&lt;P&gt;One kernel tuning parameters seems to be making &lt;EM&gt;The Difference&lt;/EM&gt;:&lt;/P&gt;&lt;PRE&gt;nvme.poll_queues=64&lt;/PRE&gt;&lt;P&gt;Lets look at one of the results of the exploratory probing:&lt;/P&gt;&lt;PRE&gt;Fedora 38, many "20 GB split virtual disk" files, virtual SCSI controller&lt;BR /&gt;&amp;nbsp; &amp;nbsp; read: IOPS=39.9k, BW=156MiB/s (164MB/s)(9361MiB/60004msec)&lt;BR /&gt;&lt;BR /&gt;Fedora 39, "Physical Drive partitions", virtual NVMe controller (virtual hardware rev 21)&lt;BR /&gt; &amp;nbsp; &amp;nbsp;read: IOPS=170k, BW=665MiB/s (697MB/s)(38.9GiB/60002msec)&lt;/PRE&gt;&lt;P&gt;The performance difference is substantial, all the while&lt;/P&gt;&lt;PRE&gt;dmesg --follow --level warn --time-format iso&lt;/PRE&gt;&lt;P&gt;does not show any of the NVMe timeout problems.&lt;/P&gt;&lt;P&gt;So, for the time being, I am running this virtualized physical Fedora Linux 39 with&lt;/P&gt;&lt;PRE&gt;sudo grubby --update-kernel=ALL --args="nvme.poll_queues=64"&lt;BR /&gt;sudo grubby --info=ALL&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Random notes:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;What a lovely rabbit hole to fall into ...&lt;/P&gt;&lt;P&gt;Nothing comes for free - NVMe polling consumes more CPU. Does it matter?&lt;/P&gt;&lt;P&gt;The optimal poll queue count is not known, and neither is clear whether split read and write poll queues are beneficial - see&amp;nbsp;&lt;A href="https://elixir.bootlin.com/linux/latest/source/drivers/nvme/host/pci.c" target="_blank"&gt;https://elixir.bootlin.com/linux/latest/source/drivers/nvme/host/pci.c&lt;/A&gt;&amp;nbsp;(or rather the version applying to your Linux kernel) for &lt;EM&gt;All Of The Truth&lt;/EM&gt; (because I was unable to find any useful &lt;EM&gt;documentation&lt;/EM&gt;).&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.vldb.org/pvldb/vol16/p2090-haas.pdf" target="_blank"&gt;What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance Storage Engines (vldb.org)&lt;/A&gt;&amp;nbsp;is an interesting article explaining a great many detail about I/O performance in Linux.&lt;/P&gt;&lt;P&gt;And finally, for stress-testing and exploration,&amp;nbsp;&lt;A href="https://cloud.google.com/compute/docs/disks/benchmarking-pd-performance" target="_blank"&gt;Benchmark persistent disk performance on a Linux VM &amp;nbsp;|&amp;nbsp; Compute Engine Documentation &amp;nbsp;|&amp;nbsp; Google Cloud&lt;/A&gt;&amp;nbsp;is a useful resource with pre-cooked "fio" commands.&lt;/P&gt;</description>
    <pubDate>Sat, 04 Nov 2023 07:53:42 GMT</pubDate>
    <dc:creator>cynar</dc:creator>
    <dc:date>2023-11-04T07:53:42Z</dc:date>
    <item>
      <title>nvme timeouts on very fast SSD (Windows Workstation 17.0.2, guest Linux)</title>
      <link>https://communities.vmware.com/t5/VMware-Workstation-Pro/nvme-timeouts-on-very-fast-SSD-Windows-Workstation-17-0-2-guest/m-p/2966271#M181099</link>
      <description>&lt;P&gt;I very very regularly have nvme timeout on me in a guest, for instance with&lt;/P&gt;&lt;PRE&gt;[ 14.310896] NET: Registered PF_QIPCRTR protocol family&lt;BR /&gt;[ 40.388311] nvme nvme0: I/O 128 QID 4 timeout, completion polled&lt;BR /&gt;[ 40.388343] nvme nvme0: I/O 96 QID 5 timeout, completion polled&lt;BR /&gt;[ 40.388400] nvme nvme0: I/O 32 QID 15 timeout, completion polled&lt;BR /&gt;[ 70.595937] nvme nvme0: I/O 96 QID 1 timeout, completion polled&lt;BR /&gt;[ 70.595969] nvme nvme0: I/O 129 QID 4 timeout, completion polled&lt;BR /&gt;[ 70.904742] vmxnet3 0000:03:00.0 ens160: intr type 3, mode 0, 9 vectors allocated&lt;/PRE&gt;&lt;P&gt;(from `dmesg`; when reproducing, simply `dmesg --level warn --follow`&lt;/P&gt;&lt;P&gt;This adds a delay of 30 seconds to file system operations.&lt;/P&gt;&lt;P&gt;This seems to be new ever since I attempted to maximize I/O performance, by&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;using the VMware NVMe virtual controller&lt;/LI&gt;&lt;LI&gt;with independent disks&lt;/LI&gt;&lt;LI&gt;and having that on a very fast SSD&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;This is somewhat reproducible for me in two scenarios&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;boot Linux guest (see above for example)&lt;/LI&gt;&lt;LI&gt;run the very fast ripgrep&amp;nbsp;(`rg`) on the full depth of the file system (e.g. `cd / ; rg x &amp;gt; /dev/null`, to hit plenty of "things" for heavy I/O&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Each of these timeouts means waiting for 30 seconds. That is not at all good.&lt;/P&gt;&lt;P&gt;The host operating system at no time issues any warning or errors.&lt;/P&gt;&lt;P&gt;Hardware (host):&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Dell Inspiron 7610 notebook == Tigerlake 8 core / 16 execution units&lt;/LI&gt;&lt;LI&gt;64 GB of memory&lt;/LI&gt;&lt;LI&gt;moved OEM SSD to secondary PCI 3.0 slot&lt;/LI&gt;&lt;LI&gt;installed 2 TB WD SN850X PCI 4.0 SSD into primary PCI 4.0 slot - this combo of SSD and PCI slot is about the fastest you can get in a simple laptop&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Software (host):&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Windows 11 Pro (fully up-to-date)&lt;/LI&gt;&lt;LI&gt;VMware Workstation 17.0.2&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Hardware (guest):&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;32 GB of memory&lt;/LI&gt;&lt;LI&gt;16 cores&lt;/LI&gt;&lt;LI&gt;independent NVMe pointing to the fast WD SN850X disk&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Software (guest)&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Fedora Linux 38 (==&amp;nbsp;6.2.13-300.fc38.x86_64, but this happened with earlier kernels, too)&lt;/LI&gt;&lt;LI&gt;... anything that puts load onto the I/O subsystem, e.g. starting the KDE desktop, running ripgrep ...&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;How can I fix the timeouts?&lt;/P&gt;</description>
      <pubDate>Mon, 01 May 2023 08:04:38 GMT</pubDate>
      <guid>https://communities.vmware.com/t5/VMware-Workstation-Pro/nvme-timeouts-on-very-fast-SSD-Windows-Workstation-17-0-2-guest/m-p/2966271#M181099</guid>
      <dc:creator>cynar</dc:creator>
      <dc:date>2023-05-01T08:04:38Z</dc:date>
    </item>
    <item>
      <title>Re: nvme timeouts on very fast SSD (Windows Workstation 17.0.2, guest Linux)</title>
      <link>https://communities.vmware.com/t5/VMware-Workstation-Pro/nvme-timeouts-on-very-fast-SSD-Windows-Workstation-17-0-2-guest/m-p/2966815#M181153</link>
      <description>&lt;P&gt;I have exactly the same issue where host&amp;nbsp; Windows 10,&amp;nbsp;VMware® Workstation 17 Pro&amp;nbsp;17.0.2 build-21581411, and guest Centos 8&lt;/P&gt;</description>
      <pubDate>Thu, 04 May 2023 14:48:45 GMT</pubDate>
      <guid>https://communities.vmware.com/t5/VMware-Workstation-Pro/nvme-timeouts-on-very-fast-SSD-Windows-Workstation-17-0-2-guest/m-p/2966815#M181153</guid>
      <dc:creator>mdudnik</dc:creator>
      <dc:date>2023-05-04T14:48:45Z</dc:date>
    </item>
    <item>
      <title>Re: nvme timeouts on very fast SSD (Windows Workstation 17.0.2, guest Linux)</title>
      <link>https://communities.vmware.com/t5/VMware-Workstation-Pro/nvme-timeouts-on-very-fast-SSD-Windows-Workstation-17-0-2-guest/m-p/2969369#M181349</link>
      <description>&lt;P&gt;I have now established a work-around on the VMware Workstation 17.0.2 (Windows 11) host and applied additional tuning to my existing Fedora Linux 38 guest installation:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;a) enable Fedora 38 init ramdisk for booting not only nvme&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;cat &amp;lt;&amp;lt; EOF &amp;gt; /etc/dracut.conf.d/scsi.conf&lt;BR /&gt;add_drivers+=" vmw_pvscsi mptbase mptscsih mptspi mptsas "&lt;BR /&gt;EOF&lt;/P&gt;&lt;P&gt;dracut -f -v&lt;/P&gt;&lt;P&gt;halt -p&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;b) switch controller for existing disk in VMware Workstation&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;- remove existing NMVe hard disk (this only unlinks it, the backing store remains, your data is safe)&lt;BR /&gt;- add new hard disk of type SCSI, pick the original backing store VMDK&lt;/P&gt;&lt;P&gt;- start VM&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;c) tune for access (optional)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;add "noatime" and "ssd" to the btrfs mounts in /etc/fstab&lt;/P&gt;&lt;P&gt;reboot&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;d) validate&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Have plenty of software and data on your virtual hard disk&lt;/P&gt;&lt;P&gt;cd /&lt;BR /&gt;rg 1 | wc -l&lt;/P&gt;&lt;P&gt;//exp: no system hangs, obviously no NVMe timeouts&lt;BR /&gt;//act: ... exactly as expected, returns a number in the millions&lt;BR /&gt;&lt;BR /&gt;This works much &lt;EM&gt;much&lt;/EM&gt; better than before. Incidentally, it seems as if a (software CI) task which previously took 55 seconds to complete now takes 5% less time (non-scientific measurement) - but really, the important part for me: no 30 second pausing / stalls due to NVMe controller timeouts.&lt;/P&gt;&lt;P&gt;Based on the above, I can only suggest to ignore the VMware recommendation to pick NVMe and to go for SCSI as a fast and &lt;EM&gt;robust&lt;/EM&gt; choice.&lt;/P&gt;</description>
      <pubDate>Fri, 19 May 2023 15:00:37 GMT</pubDate>
      <guid>https://communities.vmware.com/t5/VMware-Workstation-Pro/nvme-timeouts-on-very-fast-SSD-Windows-Workstation-17-0-2-guest/m-p/2969369#M181349</guid>
      <dc:creator>cynar</dc:creator>
      <dc:date>2023-05-19T15:00:37Z</dc:date>
    </item>
    <item>
      <title>Re: nvme timeouts on very fast SSD (Windows Workstation 17.0.2, guest Linux)</title>
      <link>https://communities.vmware.com/t5/VMware-Workstation-Pro/nvme-timeouts-on-very-fast-SSD-Windows-Workstation-17-0-2-guest/m-p/2992076#M183201</link>
      <description>&lt;P&gt;FYI, VMware Workstation 17.5, with its new hardware version 21 and refreshed NVMe support, exhibits the same unwanted behaviour.&lt;/P&gt;&lt;P&gt;On Fedora 39 (beta) with "Linux fedora-gnome 6.5.6-300.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 6 19:57:21 UTC 2023 x86_64 GNU/Linux", the commands&lt;/P&gt;&lt;PRE&gt;cd /&lt;BR /&gt;rg 1 | wc -l&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;yield plenty of timeouts - and with that comes massive delays. Run this after a fresh boot with very cold caches.&lt;/P&gt;&lt;PRE&gt;[ 182.238025] nvme nvme0: I/O 192 QID 5 timeout, completion polled&lt;BR /&gt;[ 222.686266] nvme nvme0: I/O 224 QID 6 timeout, completion polled&lt;BR /&gt;[ 252.894103] nvme nvme0: I/O 0 QID 3 timeout, completion polled&lt;BR /&gt;[ 283.614594] nvme nvme0: I/O 1 QID 3 timeout, completion polled&lt;BR /&gt;[ 283.614607] nvme nvme0: I/O 192 QID 8 timeout, completion polled&lt;BR /&gt;[ 283.614626] nvme nvme0: I/O 128 QID 11 timeout, completion polled&lt;BR /&gt;[ 314.589824] nvme nvme0: I/O 227 QID 6 timeout, completion polled&lt;BR /&gt;[ 344.990885] nvme nvme0: I/O 128 QID 1 timeout, completion polled&lt;BR /&gt;[ 345.024297] nvme nvme0: I/O 193 QID 8 timeout, completion polled&lt;BR /&gt;[ 376.286125] nvme nvme0: I/O 129 QID 11 timeout, completion polled&lt;BR /&gt;[ 432.094158] nvme nvme0: I/O 92 QID 4 timeout, completion polled&lt;BR /&gt;[ 432.094170] nvme nvme0: I/O 0 QID 6 timeout, completion polled&lt;BR /&gt;[ 462.302211] nvme nvme0: I/O 248 QID 3 timeout, completion polled&lt;BR /&gt;[ 462.302237] nvme nvme0: I/O 32 QID 6 timeout, completion polled&lt;BR /&gt;[ 462.302242] nvme nvme0: I/O 96 QID 7 timeout, completion polled&lt;BR /&gt;[ 462.302247] nvme nvme0: I/O 130 QID 11 timeout, completion polled&lt;BR /&gt;[ 493.021506] nvme nvme0: I/O 225 QID 3 timeout, completion polled&lt;BR /&gt;[ 526.301435] nvme nvme0: I/O 32 QID 5 timeout, completion polled&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 04 Nov 2023 19:45:18 GMT</pubDate>
      <guid>https://communities.vmware.com/t5/VMware-Workstation-Pro/nvme-timeouts-on-very-fast-SSD-Windows-Workstation-17-0-2-guest/m-p/2992076#M183201</guid>
      <dc:creator>cynar</dc:creator>
      <dc:date>2023-11-04T19:45:18Z</dc:date>
    </item>
    <item>
      <title>Re: nvme timeouts on very fast SSD (Windows Workstation 17.0.2, guest Linux)</title>
      <link>https://communities.vmware.com/t5/VMware-Workstation-Pro/nvme-timeouts-on-very-fast-SSD-Windows-Workstation-17-0-2-guest/m-p/2994199#M183553</link>
      <description>&lt;P&gt;It would seem as if Linux kernel tuning on NVMe parameters may help alleviate the problem.&lt;/P&gt;&lt;P&gt;I had had the opportunity to set up another virtual machine on that specific WD_BLACK SN850X SSD - basically booting a fresh and _natively_ installed Fedora Linux 39 from its three physical partitions (EFI, boot, data) via VMware Workstation physical drive access, using the virtual NVMe controller.&lt;/P&gt;&lt;P&gt;Initially, this setup was also suffering from occasionally massively degraded performance (see above).&lt;/P&gt;&lt;P&gt;One kernel tuning parameters seems to be making &lt;EM&gt;The Difference&lt;/EM&gt;:&lt;/P&gt;&lt;PRE&gt;nvme.poll_queues=64&lt;/PRE&gt;&lt;P&gt;Lets look at one of the results of the exploratory probing:&lt;/P&gt;&lt;PRE&gt;Fedora 38, many "20 GB split virtual disk" files, virtual SCSI controller&lt;BR /&gt;&amp;nbsp; &amp;nbsp; read: IOPS=39.9k, BW=156MiB/s (164MB/s)(9361MiB/60004msec)&lt;BR /&gt;&lt;BR /&gt;Fedora 39, "Physical Drive partitions", virtual NVMe controller (virtual hardware rev 21)&lt;BR /&gt; &amp;nbsp; &amp;nbsp;read: IOPS=170k, BW=665MiB/s (697MB/s)(38.9GiB/60002msec)&lt;/PRE&gt;&lt;P&gt;The performance difference is substantial, all the while&lt;/P&gt;&lt;PRE&gt;dmesg --follow --level warn --time-format iso&lt;/PRE&gt;&lt;P&gt;does not show any of the NVMe timeout problems.&lt;/P&gt;&lt;P&gt;So, for the time being, I am running this virtualized physical Fedora Linux 39 with&lt;/P&gt;&lt;PRE&gt;sudo grubby --update-kernel=ALL --args="nvme.poll_queues=64"&lt;BR /&gt;sudo grubby --info=ALL&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Random notes:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;What a lovely rabbit hole to fall into ...&lt;/P&gt;&lt;P&gt;Nothing comes for free - NVMe polling consumes more CPU. Does it matter?&lt;/P&gt;&lt;P&gt;The optimal poll queue count is not known, and neither is clear whether split read and write poll queues are beneficial - see&amp;nbsp;&lt;A href="https://elixir.bootlin.com/linux/latest/source/drivers/nvme/host/pci.c" target="_blank"&gt;https://elixir.bootlin.com/linux/latest/source/drivers/nvme/host/pci.c&lt;/A&gt;&amp;nbsp;(or rather the version applying to your Linux kernel) for &lt;EM&gt;All Of The Truth&lt;/EM&gt; (because I was unable to find any useful &lt;EM&gt;documentation&lt;/EM&gt;).&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.vldb.org/pvldb/vol16/p2090-haas.pdf" target="_blank"&gt;What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance Storage Engines (vldb.org)&lt;/A&gt;&amp;nbsp;is an interesting article explaining a great many detail about I/O performance in Linux.&lt;/P&gt;&lt;P&gt;And finally, for stress-testing and exploration,&amp;nbsp;&lt;A href="https://cloud.google.com/compute/docs/disks/benchmarking-pd-performance" target="_blank"&gt;Benchmark persistent disk performance on a Linux VM &amp;nbsp;|&amp;nbsp; Compute Engine Documentation &amp;nbsp;|&amp;nbsp; Google Cloud&lt;/A&gt;&amp;nbsp;is a useful resource with pre-cooked "fio" commands.&lt;/P&gt;</description>
      <pubDate>Sat, 04 Nov 2023 07:53:42 GMT</pubDate>
      <guid>https://communities.vmware.com/t5/VMware-Workstation-Pro/nvme-timeouts-on-very-fast-SSD-Windows-Workstation-17-0-2-guest/m-p/2994199#M183553</guid>
      <dc:creator>cynar</dc:creator>
      <dc:date>2023-11-04T07:53:42Z</dc:date>
    </item>
    <item>
      <title>Re: nvme timeouts on very fast SSD (Windows Workstation 17.0.2, guest Linux)</title>
      <link>https://communities.vmware.com/t5/VMware-Workstation-Pro/nvme-timeouts-on-very-fast-SSD-Windows-Workstation-17-0-2-guest/m-p/2994237#M183560</link>
      <description>&lt;P&gt;Thanks for sharing your invaluable&amp;nbsp;findings&amp;nbsp;&lt;a href="https://communities.vmware.com/t5/user/viewprofilepage/user-id/681006"&gt;@cynar&lt;/a&gt;!&lt;BR /&gt;I hit the same nvme timeout issue on Fedora 38 guest, Windows 10 host, VMware Player 17.0, and as you also mentioned upgrading to Fedora 39 and VMware 17.5 didn't fix it, and I had no luck finding a solution anywhere else.&lt;BR /&gt;Tweaking the nvme.poll_queues kernel param (I went with 32 for now) seem to have completely eliminated the nvme timeouts (even under stress-testing with fio), and improved the performance.&lt;/P&gt;</description>
      <pubDate>Sat, 04 Nov 2023 21:33:13 GMT</pubDate>
      <guid>https://communities.vmware.com/t5/VMware-Workstation-Pro/nvme-timeouts-on-very-fast-SSD-Windows-Workstation-17-0-2-guest/m-p/2994237#M183560</guid>
      <dc:creator>agostonbarna</dc:creator>
      <dc:date>2023-11-04T21:33:13Z</dc:date>
    </item>
  </channel>
</rss>

