Enthusiast
Enthusiast

ESXi 6.5 Slow vms, High "average response time"

I am running esxi 6.5 with the latest patches and vmware tools 10.1.5

I am having very inconsistent performance issues with both of my hosts. Basically the windows 2016/windows 10 guests are sluggish at times. nothing will load and the os is basically unresponsive when interacting with the gui. The issue seems to be stemming from disk performance but I am not 100% certain that this is the cause, it may be a side affect.

What I have noticed is that some vms show a average response time for the disk of about 2000ms. Yet if i check the performance monitor at a host level the disk and datastores are all showing sub 1ms response time. I am not able to explain the inconsistencies there.

I have a local ssd datastore on each host as well as a rather fast nvme iscsi san that is connected via 100gb mellanox connectx4 cards. I see the issue with both hosts and both datastores. The issue seems to be worse now with the most recent patches and vmware tools drivers. I am using vmxnet3 network cards and paravirtual scsi controllers on all vms.

I have run disk benchmarks on the vms and the resutls vary. I have already seen it where i run a disk benchmark on a guest, get horrible results, vmotion it to the other host, and benchmarks to the san are fine, and then i vmotion the guest back to the original host and the results are fine the second time I run it.

here is an example of a bad test, the reads are terrible:

-----------------------------------------------------------------------

CrystalDiskMark 5.2.0 x64 (C) 2007-2016 hiyohiyo

                           Crystal Dew World : http://crystalmark.info/

-----------------------------------------------------------------------

* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]

* KB = 1000 bytes, KiB = 1024 bytes

   Sequential Read (Q= 32,T= 2) :     0.655 MB/s

  Sequential Write (Q= 32,T= 2) :  5384.173 MB/s

  Random Read 4KiB (Q= 32,T= 2) :     0.026 MB/s [     6.3 IOPS]

Random Write 4KiB (Q= 32,T= 2) :   617.822 MB/s [150835.4 IOPS]

         Sequential Read (T= 1) :     2.306 MB/s

        Sequential Write (T= 1) :  1907.004 MB/s

   Random Read 4KiB (Q= 1,T= 1) :    53.942 MB/s [ 13169.4 IOPS]

  Random Write 4KiB (Q= 1,T= 1) :    52.104 MB/s [ 12720.7 IOPS]

  Test : 50 MiB [C: 5.2% (15.6/299.5 GiB)] (x1)  [Interval=5 sec]

  Date : 2017/03/25 20:29:18

    OS : Windows 10 Enterprise [10.0 Build 14393] (x64)

 

a few seconds later on the same setup i get perfectly fine results:

-----------------------------------------------------------------------

CrystalDiskMark 5.2.0 x64 (C) 2007-2016 hiyohiyo

                           Crystal Dew World : http://crystalmark.info/

-----------------------------------------------------------------------

* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]

* KB = 1000 bytes, KiB = 1024 bytes

   Sequential Read (Q= 32,T= 2) :  6655.386 MB/s

  Sequential Write (Q= 32,T= 2) :  5654.851 MB/s

  Random Read 4KiB (Q= 32,T= 2) :   695.193 MB/s [169724.9 IOPS]

Random Write 4KiB (Q= 32,T= 2) :   609.216 MB/s [148734.4 IOPS]

         Sequential Read (T= 1) :  1810.393 MB/s

        Sequential Write (T= 1) :  1626.112 MB/s

   Random Read 4KiB (Q= 1,T= 1) :    53.266 MB/s [ 13004.4 IOPS]

  Random Write 4KiB (Q= 1,T= 1) :    54.289 MB/s [ 13254.2 IOPS]

  Test : 50 MiB [C: 5.2% (15.7/299.5 GiB)] (x1)  [Interval=5 sec]

  Date : 2017/03/25 20:32:21

    OS : Windows 10 Enterprise [10.0 Build 14393] (x64)

115 Replies
Contributor
Contributor

It would be really interesting where we can get the hot fix.

0 Kudos
Enthusiast
Enthusiast

I had some slightly different timeframes from support today:

- Fix: Official release staged to ESXi 6.5 U1. Due for release during last week of July (Tentative - can be early or delayed).

- Hot Patch: A request for hot patch can be considered, however, release of hot patch to match the specific environment shall take 3 weeks  placing under high priority. Can be awaited for fix mentioned above due to said duration.

0 Kudos
Contributor
Contributor

Hmmm. I have a HP DL 360 Gen 7 Server with a P410i RAID Controller, 128 GB RAM. I´ve upgrated from 6.0 to 6.5.0d (direct VMware Sources) and now my VMs (3x WINDOWS 10 pro creators Update) are extremly slow, hanging and you can´t really open programms or work with.

Does anyone think, that I have the same problem as all here or do I have an other problem with my machine?

Can someone help me with this issue?

0 Kudos
Enthusiast
Enthusiast

Hi Dominic

Do you have RAID Cache module on the P410i and is it working?  Without raid cache VMs on that controller are very slow, almost unusable.  I've got experience first hand with that configuration.

As for the issue described in this thread, if you are able to see a remarkable difference in speed after changing the registry setting on a previous post then yes, you are affected by the same thing.  If not... it may be something else.

0 Kudos
Contributor
Contributor

Yes I have. I had a 512 MB Cache Module and I upgrated to a 512 MB Flash-Cash Module with Super-CAP two days before. I had the same slow machines with my old cache. Nothing changed.

I´ve upgrated from 64 to 128 GB RAM - nothing changed. No more ideas ...

As for the issue described in this thread, if you are able to see a remarkable difference in speed after changing the registry setting on a previous post then yes, you are affected by the same thing.  If not... it may be something else.

What does you mean with "after changing the registry settings"?

Does you mean the registry from Windows? Yes, I think I did. With the last creators update they will change the registry for sure, or I´m wrong?

Thanks for your help !

0 Kudos
Enthusiast
Enthusiast

Hi Dominic

Due to the age of that server and controller, even if you have 512MB module installed if the battery is failed or failing then your cache will not work.  The batteries unfortunately don't have a long shelf life and it may be difficult to locate one.  You'll have to search online to find a replacement if the battery is dead or no longer charging or if it is missing.  You can also check to make sure the battery is plugged in.  HP has some guides on their website or you can call their support for help.

If you had this problem before moving to 6.5 then this thread is most likely not the issue you are having.

The 6.5 issue described in this thread includes symptoms of very high disk latency and 100% disk access time in performance monitor. By registry change, the workaround for the issue with 6.5 is currently:

Windows 8, 10, Server 2012, 2012 R2 and 2016, . 

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\DisableDeleteNotification

change this value from 0 to 1 and then reboot

Contributor
Contributor

Thank you so much for your help!

I didn´t had this extrem problem befor I´ve updated to 6.5. It was not the fastest, but with 64 GB RAM config it was on the limit. Now it should have enough RAM.

Let me try the work arround. Will see if it helps. I let you know.

What does it do?

0 Kudos
Contributor
Contributor

It seams to work much faster now. Thank you very much!

Will see how it works tomorrow.

0 Kudos
Enthusiast
Enthusiast

Hi there,

just found this thread today, some people are discussing the same problem in a different thread:  Windows Server 2012 R2 Bad Performance

But i can conform running the powershell command:

fsutil behavior set DisableDeleteNotify 1

will do the trick and the machines are much faster and the sky high disk latencies are gone. There is a bug in the UNMAP Command.

Thanks to mvduijn​ for keeping us up 2 date.

Regards

Peter

Enthusiast
Enthusiast

We have published a KB article along with a workaround while we are working to provide a fix in upcoming update release.

https://kb.vmware.com/kb/2150591

regards,

Shashank

0 Kudos
Enthusiast
Enthusiast

Link needs fixing (remove the 'i' before 'kb')

im now concerned you say working on a fix for 'an upcoming update', when I have been told it will definitely be in the U1 release.

will it be available in U1 or a later release?!

Contributor
Contributor

Hello,

the 6.5 U1 update is out, but in the changelog I could not find any reference to this issue, did I miss that or it will be delivered on a separate patch later on?

0 Kudos
Enthusiast
Enthusiast

Hello,

search for "unmap" then you will find what you are looking for 😉 e.g.:

"Performance issues on Windows Virtual Machine (VM) might occur after upgrading to VMware ESXi 6.5.0 P01 or 6.5 EP2

Performance issues might occur when the not aligned unmap requests are received from the Guest OS under certain conditions. Depending on the size and number of the not aligned unmaps, this might occur when a large number of small files (less than 1 MB in size) are deleted from the Guest OS."

Because it could not getting more worse, i was brave and upgraded all hosts on the first day. All VMs are now on HW-Level 13 and Vmware Tools 10279

It works great in our Environment. Performance is great again and unmapping with W2K12 and above machines works great. Thin-Provisioned disks are also now shown as Thin-Provisioned in Windows Optimization Tool.

Happy Updating! Smiley Happy

0 Kudos
Contributor
Contributor

Oh, my bad I missed the line (don't ask me why, but I was searching for "2012" instead of "unmap") 🙂

I'll go on with the update then, thanks for your feedback

0 Kudos
Enthusiast
Enthusiast

no problem, had to read the article also twice till i found it. 😉

0 Kudos
Contributor
Contributor

Did this resolve the issue for Win 10 and Server 2016 high resource (CPU) use. I observed whenever Windows Updates were in process.

0 Kudos
Enthusiast
Enthusiast

Has anyone experienced this issue after upgrading to 6.5u1? Have you gone back to all your VMs and put back the registry workaround from this KB?  Performance issues on Windows virtual machine with hardware version 13 after upgrading to ESXi 6.5 (...

0 Kudos
Enthusiast
Enthusiast

I am running 6.5U1 on a Dell R730 with a PERC H330 Mini with 8 - 600 GB 15k drives setup as a RAID 10.  I have a Windows 2016 VM on Hardware version 13 with the latest VMware tools and a paravirtual SCSI controller.  I get random alerts from VeeamOne about disk latency.  From what I'm seeing, U1 should have the "fix" and disabling the UNMAP isn't necessary.  Am I correct? 

I moved the VM to an R520 with a PERC H710P mini and I only got 1 alert so far.  The load on the R730 is very minimal so I do not believe it is hardware related.  Thoughts?  Thanks!

0 Kudos
Contributor
Contributor

I'm having the same issues. We just had brand new 730xd's built with ESXi 6.5U1 and are seeing very high read/write latency. All the firmware and drivers have been updated on the server but I'm still having no luck. I have a ticket open with VMware and Dell now looking into it. If anyone has had any luck with resolving this issue, please share!!!

0 Kudos
Enthusiast
Enthusiast

Have you tried the workaround in this KB?  Performance issues on Windows virtual machine with hardware version 13 after upgrading to ESXi 6.5 (...

This impacts Server 2012, 2012 R2, 2016, Windows 8, 8.1, 10

Windows 7 and Server 2008 R2 and older are not impacted by the issue discussed in this thread.

After making the change inside the guest VM you'll need to reboot.  If the performance issues continue you may want to start a new thread as it would be unrelated to what is being discussed here.  You may also want to open a ticket with VMware.

0 Kudos