VMware Cloud Community
LumH
Enthusiast
Enthusiast
Jump to solution

ESXi 6.7 - Windows 10 Pro VM - poor disk performance

Setup

  • Closed lab (no internet access)
  • One Physical switch
  • 4 working ESXi VM Hosts (but only one powered on for this issue)
    2 with ESXi 6.5, 2 with ESXi 6.7.
  • One Windows 10 Pro PC - with installed VMWare Workstation 14 running vCenter VM and Content Library NAS (FreeNAS) VM
  • 2 other Windows 10 Pro PCs
  • All static IP addresses;   No Windows Servers (no active directory), DHCP, etc.
    BIND for DNS service (needed by vCenter)
  • All VMWare products are currently evaluation licenses.

Note: newbie VMWare user...

Description:

Wanted to perform a disk test to compare between a physical PC and a VM, so I wrote an application to append the contents of one file (75 GB) to the end of an existing one (5 GB).

I ran the test app on a older and slower (as compared to the one for the ESXi Host) Windows 10 Pro physical PC (SATA IDE) - took 15 mins to complete.

The same test on a Windows 10 Pro VM running in an ESXi 6.7 Host (SATA AHCI) - took 37 minutes.

Basically, it's just this VM Host with the one VM running.

Why would there would be such a big difference in time?

Appreciate any help... Thank you.

-------------------------------------------------------------------------------------------

Details of ESXi 6.7 Host:

  • SuperMicro X10SRA-F (Intel C612 SATA Controller)
  • 128GB memory
  • BIOS I/O configured for SATA AHCI
  • DataStore consists of 2 1 Seagate 1TB SATA HD (no RAID)
  • version of AHCI native driver:  1.2.0-6vmw.670.0.0.8169922

Details of VM:

  • 1 CPU
  • 8 GB memory
  • 512 GB disk
    SCSI Controller (1:0), LSI Logic SAS, Dependant
  • Windows 10 Pro

Disk performance stats (esxtop):

InkedDiskPerformance_LI.jpg

Reply
0 Kudos
1 Solution

Accepted Solutions
LumH
Enthusiast
Enthusiast
Jump to solution

bluefirestorm​, Wolken​, vmrale​:

You guys hit it on the head!


The VM was using thin-provisioned disk! In hindsight, it makes senses - as the data is being written, the overhead must be for formatting the disk on the fly.

I ran the same file append app (75GB transfer) on one VM with eager thick, and one VM with lazy thick:

  1. eager thick - 13 mins
    As you guys said, this is expected...
  2. lazy thick - 18 mins

That's more inline with my expectations of disk performance (versus 15 mins on the slower physical PC, and 37 mins for the thin-provisioned VM).

I also needed to perform above tests after snapshots was taken:

  1. eager thick - 25 mins
  2. lazy thick - 23 mins

Again, I can imagine the performance being worse, because the snapshot causes the changing/changed data to be written to the snapshot repository, and that means it has to perform formatting on the fly (as in the case of thin-provisioned). Still, the values are better than the 37 mins.

I will to see if there are any snapshot repository advanced configuration parameters that can be used to improve that latency. Barring that, I have the fallback of using SSDD(s) instead of spindle drives. I don't think I have any applications that will hit the disk that hard, but it was a good discovery to learn about these limitations.

Appreciate all your help!  Thanks!

View solution in original post

Reply
0 Kudos
11 Replies
vmrale
Expert
Expert
Jump to solution

According to esxtop screenshot it's not looking bad. You have to check if You can expect better performance from HDDs installed. Read the specs of disks You are using.

VMkernel is not slowing the write process down (KAVG/cmd).

Interpreting esxtop 4.1 Statistics

"

DAVG

This is the latency seen at the device driver level. It includes the roundtrip time between the HBA and the storage.

DAVG is a good indicator of performance of the backend storage. If IO latencies are suspected to be causing performance problems, DAVG should be examined. Compare IO latencies with corresponding data from the storage array. If they are close, check the array for misconfiguration or faults. If not, compare DAVG with corresponding data from points in between the array and the ESX Server, e.g., FC switches. If this intermediate data also matches DAVG values, it is likely that the storage is under-configured for the application. Adding disk spindles or changing the RAID level may help in such cases.

"

Regards

Regards
Radek

If you think your question have been answered correctly, please consider marking it as a solution or rewarding me with kudos.
Reply
0 Kudos
LumH
Enthusiast
Enthusiast
Jump to solution

Thanks vmrale,

The SATA HDs used for the VM Host and the physical PC is the same Seagate SATA model.

So I guess I should either be looking at BIOS settings of the VM Host and/or settings for the Windows 10 VM ?

Are there any more tweaks or statistics I can look at from the ESXi toolset that would help pin-point the issue, now that we've determined it's not the virtual drive driver?

Thanks!

Reply
0 Kudos
vmrale
Expert
Expert
Jump to solution

Focus on the hardware level. Find the latency, speed and interface specification of the disks You have.

As an example here You can read the avg latency You can expect and try to get closer to this number tweaking the configuration parameters.

Seagate Barracuda ST1000DM010 Specs - CNET

Add more disks and RAID controller to create RAID group. If You have 2-3 hosts consider to add SSD and implement vSAN solution.

Consider the screenshot as a baseline making any change to the configuration compare it with the baseline to get know if You are improving or not. Make one change at a time and compare with the previous results. ESXTOP is a sufficient tool to observe the results.

If You stack with hardware the You can proceed with VM virtual adapter tweaking. Use PVSCSI as the best in most configurations.

Regards

Regards
Radek

If you think your question have been answered correctly, please consider marking it as a solution or rewarding me with kudos.
Reply
0 Kudos
bluefirestorm
Champion
Champion
Jump to solution

There was (maybe still exists) a problem with the SATA AHCI driver in ESXi 6.5. So if that problem is not resolved in ESXi 6.7 that might be the cause of the poor disk performance.

Have a look at this thread post Re: Very slow speed on SSD

Reply
0 Kudos
LumH
Enthusiast
Enthusiast
Jump to solution

Appreciate the input bluefirestorm.

If you are referring to the native VM driver issue which caused storage performance issue in ESXi 6.5 (Anthony Spiteri), it was resolved in ESXi 6.5 U1 release.

Thanks!

Reply
0 Kudos
LumH
Enthusiast
Enthusiast
Jump to solution

vmrale,

I don't think I want to start introducing more disks and RAID into the equation as it really is not my goal at this point.

I just really want to understand why a physical PC with lower performance hardware can outperform (albeit a slow file append operation may not qualify as a complete performance failure) a VM Host with higher performance hardware, running one VM (configured as close as I know how to the physical PC):

  • Windows 10 Pro v1703 (physical) versus Windows 10 Pro v1709 (VM)
  • storage controller: SATA IDE (physical) versus SATA ACHI ( SCSI Controller in VM)
    no RAID
  • Same physical Seagate SATA drive - Barracuda 7200.12, ST31000528AS

If you say that ESXi is showing that it is not introducing much latency, then what is causing the performance difference?

Did I miss something in the BIOS setting of the VM Host? Or did I incorrectly install/configure the VM Host or VM, or have I not customized the Windows 10 on the VM for better performance?

Will look into VM virtual adapter tweaking as you suggested - do you have a specific link in mind?

Will look into using Paravirtual SCSI adapters - VMware Knowledge Base nicely documents how to configure an existing Windows boot disk to use PVSCSI adapter.

Regards

Reply
0 Kudos
bluefirestorm
Champion
Champion
Jump to solution

Windows 10 Pro v1703 (physical) versus Windows 10 Pro v1709 (VM)

Does both physical and virtual have the Meltdown patch?

Meltdown patch can have a bad performance side effect for both physical and virtual machines that have an I/O intensive operation (which I think a file append would be). The mitigation against this performance side effect requires that the CPU is Haswell generation or newer. You don't indicate what CPUs you have for the physical Windows 10 and the ESXi host. ESXi requires virtual machine version 11 or higher so that Haswell instructions are exposed to the VM. ESXi 6.7 would be using version 14 by default though.

Is the VM virtual disk thick provisioned?

Also try bumping the number of vCPUs in the VM from 1 to 2.

Reply
0 Kudos
vmrale
Expert
Expert
Jump to solution

Server's hardware is often designed to last as long as possible, to guarantee a stable work and to give tweaking possibilities. The expensive ones give the higher performance ratios too.

To further investigate performance issues and optimize vDisk performance check if AHCI Controller firmware matches driver version in ESXi 6.7. Maybe hardware's flash needs to be updated or a host's driver.

To optimize write operations in VM create a eager-zeroed thick vDisks.

Regards

Regards
Radek

If you think your question have been answered correctly, please consider marking it as a solution or rewarding me with kudos.
Reply
0 Kudos
Wolken
Enthusiast
Enthusiast
Jump to solution

Thick provisioned zero-eager disks do a trick. Check the  discussion here: thin vs thick provisioning and performance impact

Reply
0 Kudos
LumH
Enthusiast
Enthusiast
Jump to solution

vmrale,

I did a bit more reading of the use of PVSCSI, and VMWare KB article 1010398 states that "PVSCSI adapters are not suitable for DAS environments". So I'm not going to follow up on this for now.

Reply
0 Kudos
LumH
Enthusiast
Enthusiast
Jump to solution

bluefirestorm​, Wolken​, vmrale​:

You guys hit it on the head!


The VM was using thin-provisioned disk! In hindsight, it makes senses - as the data is being written, the overhead must be for formatting the disk on the fly.

I ran the same file append app (75GB transfer) on one VM with eager thick, and one VM with lazy thick:

  1. eager thick - 13 mins
    As you guys said, this is expected...
  2. lazy thick - 18 mins

That's more inline with my expectations of disk performance (versus 15 mins on the slower physical PC, and 37 mins for the thin-provisioned VM).

I also needed to perform above tests after snapshots was taken:

  1. eager thick - 25 mins
  2. lazy thick - 23 mins

Again, I can imagine the performance being worse, because the snapshot causes the changing/changed data to be written to the snapshot repository, and that means it has to perform formatting on the fly (as in the case of thin-provisioned). Still, the values are better than the 37 mins.

I will to see if there are any snapshot repository advanced configuration parameters that can be used to improve that latency. Barring that, I have the fallback of using SSDD(s) instead of spindle drives. I don't think I have any applications that will hit the disk that hard, but it was a good discovery to learn about these limitations.

Appreciate all your help!  Thanks!

Reply
0 Kudos