1 2 Previous Next 17 Replies Latest reply on Mar 16, 2019 8:29 PM by Testerhood

    Disk write cache broken in Workstation 14

    superciliousdude Novice

      Hi,

       

      I have upgraded from 12.x to 14 and despite not changing anything in my configuration, the write caching is not working anymore.

       

      I'm running 14.1.1. on a Windows 10 x64 host.

      I have tried any and all the following options in my config.ini:

      hard-disk.hostBuffer = "enabled"

      hard-disk.useUnbuffered = "FALSE"

      hard-disk.synchronous = "FALSE"

      aiomgr.buffered = "TRUE"

      aiomgr.unbuf = "FALSE"

      Still, despite my entire VMDK being cached in RAM, the vmware-vmx process waits for I/O to complete before acknowledging it inside the VM. How can I fix this and get the old behaviour?

       

      I depend on the I/O performance inside the VM and many of my workloads are broken without it. My system is using ECC RAM, and the entire workstation is powered by a 3KVA UPS and running a 12 disk RAID6 so I'm not at all worried about data loss. I just desperately need to disable the caching so I can upgrade to 14.x again. In the meantime, I have been forced back to Workstation 12.

        • 1. Re: Disk write cache broken in Workstation 14
          superciliousdude Novice

          I know that this does not constitute a viable solution to this problem for most people, but I managed to find a workaround that works for my purposes. I monkey patched the vmware-vmx.exe file to simply remove the sync request from the file handle at creation time.

           

          All I did was inject a single AND instruction before each call to CreateFileW() - the equivalent of:

          dwFlagsAndAttributes &= ~(FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH)

           

          This is simply ANDing the register containing dwFlagsAndAttributes with 0x5FFFFFFF, and thus no more bypassing Windows cache.

           

          With this simple tweak I am back to having >1 million IOPS in my virtual machines again as I did with Workstation 12.5.x

           

          Hope this helps others who have been burned by this problem and VMware's increasingly hostile attitude towards Workstation users.

          2 people found this helpful
          • 2. Re: Disk write cache broken in Workstation 14
            mackpt1 Novice

            Here is how to enable Write Caching in windows 10 To Enhance Performance

            1:- Press the Windows+R key to move on to the Run page. Type in it devmgmt.msc and hit the Enter key.

            2:- On the device manager page that pops up,scroll down to see the options under disk drives and expand them. Choose the device of your choice and right click on them.Click on the list,the Properties option.

            3:- Out of the various options available on top, click on the Policies option.Tick on the text box next to Enable Disk caching and press the OK button.

            4:- Open up any of the complex apps that use to run slower before to see that they now require very little time to move on.

            Source:- https://merabheja.com/enable-write-caching-in-windows-10/

            • 3. Re: Disk write cache broken in Workstation 14
              superciliousdude Novice

              No, I am not talking about caching at the disk layer - this has an immeasurably small effect on performance for my workload. I am talking about caching at the file-system layer on a Windows host OS. My ugly solution is a literal 2 orders of magnitude (100x) increase in performance, and is not limited to tiny disk caches of a few megabytes in size, but the entire 128GB of RAM in my host system can be used to cache reads and writes from VMs. Going from around 10K IOPS in a virtual machine (random 4K, single FIFO) to around 1000000 IOPS is a huge difference in performance.

               

              The fact that it is no longer supported in VMware has forced me to start migrating to another hypervisor. With a Linux host OS and KVM, I get approximately 20% better performance than VMware, but worse virtual GPU support in Windows VMs so its a difficult trade-off.

              1 person found this helpful
              • 4. Re: Disk write cache broken in Workstation 14
                bonnie201110141 Expert
                VMware Employees

                Can you please try below option?

                hard-disk.useUnbuffered = “TRUE”

                 

                Let me if it works for you.

                • 5. Re: Disk write cache broken in Workstation 14
                  bonnie201110141 Expert
                  VMware Employees

                  All other options should be deleted.

                  • 6. Re: Disk write cache broken in Workstation 14
                    superciliousdude Novice

                    No, the performance is back to being terrible with that option. It looks like there is no "official" way to disable sync I/O with Workstation 14.

                    • 7. Re: Disk write cache broken in Workstation 14
                      bonnie201110141 Expert
                      VMware Employees

                      Sorry, we misunderstood your issue. There seems to be a bug in our product. And we are trying to fix this. Meanwhile, please try adding below option in config file as a workaround.

                       

                      aiomgr.simple="Generic"

                       

                       

                      1 person found this helpful
                      • 8. Re: Disk write cache broken in Workstation 14
                        superciliousdude Novice

                        Thanks for your response, but I am not in a position to test it right now as I have replaced Workstation 14 with Workstation 12 on all the machines here because we discovered a severe bug with Workstation 14.1.1 giving (non-deterministic) incorrect results for a deterministic computation.

                         

                        I cannot overstate how incredibly frustrating it has been over the past week or so trying to figure out what the problem was as we never suspected a hypervisor bug. The error occurs only inside VMs and only when those exact same VMs are running under Workstation 14.1.1 - no problem with those same VMs running under Workstation 12.5 or natively on the host (we use Server 2016 on a variety of machines, mainly HP DL360 and some Lenovo). There are no ECC errors logged and the problem occurs on Sandy Bridge CPUs through all generations to Broadwell CPUs - we don't have any Xeon Golds or EPYC chips yet to test on. All run the tests fine on Workstation 12 and natively on the host OS, all fail (with different results) on Workstation 14.

                         

                        Is VMware aware of this bug and is there a fix in the works?

                        • 9. Re: Disk write cache broken in Workstation 14
                          bonnie201110141 Expert
                          VMware Employees

                          We are so sorry to hear that you ran into severe issue with Workstation 14.1.1. Can you please give a detailed description about your issue? What kind of computation are you running in VM? How can we reproduce the issue locally? Thanks a lot!

                          • 10. Re: Disk write cache broken in Workstation 14
                            superciliousdude Novice

                            I'm not aware of how to reproduce the problem easily. We have a long test-suite that runs for about 10 hours every night on at least one of our machines in a VM. We use VMware workstation as our hypervisor and a mix of Server 2012 R2 and Server 2016 as host and exclusively on Xeon CPUs (all generations from sandy bridge to broadwell). Our shorter test-suites (<1 hour) pass on Workstation 14 just fine, and longer test-suites (~3 hours) only fail intermittently. However, the biggest test suite runs for just under 10 hours and fails 100% of the time under Workstation 14, and each time the output data has a different checksum not matching the known-good result.

                             

                            At first, I suspected bad hardware and replaced the CPUs, RAM and the raid controller (HP P822), but the problem persisted so I replaced the server and then again with another vendor's hardware. When that still showed the same problem I ran the test overnight on all our idle machines. The passing ones had only one thing in common: Workstation 12. I then tested the failing servers by running the test suite natively and they all passed.

                             

                            I am not in a position to provide a copy of our codebase as a test case to vmware, but the problem is 100% reproducible. If it helps, I can run the test on a spare machine at home and provide copies of logs. Please let me know what specific steps to take to generate the necessary logs.

                            1 person found this helpful
                            • 11. Re: Disk write cache broken in Workstation 14
                              superciliousdude Novice

                              Hi bonnie201110141,

                               

                              Thanks to the 4 day weekend, I finally had some time to run some more extensive tests. The good news is that the aiomgr.simple="Generic" config setting works and the I/O performance is restored in 14.1.1. However, an additional effect is that the non-deterministic wrong results in our test suite also disappeared with this option.

                               

                              I am now 100% convinced there is a race-condition or similar bug in the default aiomgr implementation of Workstation 14.x.

                               

                              The bug occurs on VMDKs stored on an NTFS volumes using Windows 10/server 2016 storage spaces. The bug disappears when using aiomgr.simple="Generic". The bug is Workstation 14.x specific, no version of 12.x has this behaviour. The bug is easily triggered with high I/O using a mix of random reads and writes, not sequential I/O. The bug is independent of which virtual disk adapter is used, both pvscsi and lsisas exhibit the bug. The bug is independent of which guest OS is used, both Linux, Windows 7 and Windows server 2016 guests can trigger the bug.

                               

                              Hopefully VMware can track this down and fix it, but in the meantime, setting aiomgr.simple="Generic" works correctly.

                              2 people found this helpful
                              • 12. Re: Disk write cache broken in Workstation 14
                                bonnie201110141 Expert
                                VMware Employees

                                Thanks for your tests! We will try to reproduce locally and investigate.

                                • 13. Re: Disk write cache broken in Workstation 14
                                  richard612 Novice

                                  Just stumbled across this thread by superciliousdude whilst Google searching for ways to more aggressively disk cache and get VMware Workstation running a bit faster.  Very relevant to my interests.

                                   

                                  This is in a VM on Workstation 14.1.1 running Server 2016 prior to aiomgr.simple="Generic":

                                   

                                  -----------------------------------------------------------------------

                                  CrystalDiskMark 6.0.0 x64 (C) 2007-2017 hiyohiyo

                                  -----------------------------------------------------------------------

                                   

                                  Sequential Read (Q= 32,T= 1)  :   215.485 MB/s

                                  Sequential Write (Q= 32,T= 1) :   253.429 MB/s

                                  Random Read 4KiB (Q=  8,T= 8) :    27.683 MB/s [   6758.5 IOPS]

                                  Random Write 4KiB (Q=  8,T= 8):    33.578 MB/s [   8197.8 IOPS]

                                  Random Read 4KiB (Q= 32,T= 1) :    17.497 MB/s [   4271.7 IOPS]

                                  Random Write 4KiB (Q= 32,T= 1):    10.862 MB/s [   2651.9 IOPS]

                                  Random Read 4KiB (Q=  1,T= 1) :     4.866 MB/s [   1188.0 IOPS]

                                  Random Write 4KiB (Q=  1,T= 1):    11.483 MB/s [   2803.5 IOPS]

                                   

                                    Test : 500 MiB [C: 39.0% (15.4/39.5 GiB)] (x1)  [Interval=5 sec]

                                    Date : 2018/04/26 17:31:24

                                      OS : Windows Server 2016 Datacenter (Full installation) [10.0 Build 14393] (x64)

                                   

                                  This is after aiomgr.simple="Generic":

                                   

                                  -----------------------------------------------------------------------

                                  CrystalDiskMark 6.0.0 x64 (C) 2007-2017 hiyohiyo

                                  -----------------------------------------------------------------------

                                   

                                  Sequential Read (Q= 32,T= 1)  :  805.823 MB/s

                                  Sequential Write (Q= 32,T= 1) :  339.611 MB/s

                                  Random Read 4KiB (Q=  8,T= 8) :  188.217 MB/s [  45951.4 IOPS]

                                  Random Write 4KiB (Q=  8,T= 8):  169.168 MB/s [  41300.8 IOPS]

                                  Random Read 4KiB (Q= 32,T= 1) :   51.101 MB/s [  12475.8 IOPS]

                                  Random Write 4KiB (Q= 32,T= 1):   41.781 MB/s [  10200.4 IOPS]

                                  Random Read 4KiB (Q=  1,T= 1) :   26.277 MB/s [   6415.3 IOPS]

                                  Random Write 4KiB (Q=  1,T= 1):   15.358 MB/s [   3749.5 IOPS]

                                   

                                    Test : 500 MiB [C: 38.8% (15.3/39.5 GiB)] (x5)  [Interval=5 sec]

                                    Date : 2018/04/26 19:25:23

                                      OS : Windows Server 2016 Datacenter (Full installation) [10.0 Build 14393] (x64)

                                   

                                  I think VMware has a problem on their hands.  Related question: can I put this setting in settings.ini or config.ini to make it global?

                                   

                                  Edit: Yes.  It goes in config.ini.  Just tested this.

                                  1 person found this helpful
                                  • 14. Re: Disk write cache broken in Workstation 14
                                    bonnie201110141 Expert
                                    VMware Employees

                                    Yes, we are aware of this issue and working for a fix. Currently, please add that option in config.ini. Thanks!

                                    2 people found this helpful
                                    1 2 Previous Next