Setup
Note: newbie VMWare user...
Description:
Wanted to perform a disk test to compare between a physical PC and a VM, so I wrote an application to append the contents of one file (75 GB) to the end of an existing one (5 GB).
I ran the test app on a older and slower (as compared to the one for the ESXi Host) Windows 10 Pro physical PC (SATA IDE) - took 15 mins to complete.
The same test on a Windows 10 Pro VM running in an ESXi 6.7 Host (SATA AHCI) - took 37 minutes.
Basically, it's just this VM Host with the one VM running.
Why would there would be such a big difference in time?
Appreciate any help... Thank you.
-------------------------------------------------------------------------------------------
Details of ESXi 6.7 Host:
Details of VM:
Disk performance stats (esxtop):
bluefirestorm, Wolken, vmrale:
You guys hit it on the head!
The VM was using thin-provisioned disk! In hindsight, it makes senses - as the data is being written, the overhead must be for formatting the disk on the fly.
I ran the same file append app (75GB transfer) on one VM with eager thick, and one VM with lazy thick:
That's more inline with my expectations of disk performance (versus 15 mins on the slower physical PC, and 37 mins for the thin-provisioned VM).
I also needed to perform above tests after snapshots was taken:
Again, I can imagine the performance being worse, because the snapshot causes the changing/changed data to be written to the snapshot repository, and that means it has to perform formatting on the fly (as in the case of thin-provisioned). Still, the values are better than the 37 mins.
I will to see if there are any snapshot repository advanced configuration parameters that can be used to improve that latency. Barring that, I have the fallback of using SSDD(s) instead of spindle drives. I don't think I have any applications that will hit the disk that hard, but it was a good discovery to learn about these limitations.
Appreciate all your help! Thanks!
According to esxtop screenshot it's not looking bad. You have to check if You can expect better performance from HDDs installed. Read the specs of disks You are using.
VMkernel is not slowing the write process down (KAVG/cmd).
Interpreting esxtop 4.1 Statistics
"
DAVG
This is the latency seen at the device driver level. It includes the roundtrip time between the HBA and the storage.
DAVG is a good indicator of performance of the backend storage. If IO latencies are suspected to be causing performance problems, DAVG should be examined. Compare IO latencies with corresponding data from the storage array. If they are close, check the array for misconfiguration or faults. If not, compare DAVG with corresponding data from points in between the array and the ESX Server, e.g., FC switches. If this intermediate data also matches DAVG values, it is likely that the storage is under-configured for the application. Adding disk spindles or changing the RAID level may help in such cases.
"
Regards
Thanks vmrale,
The SATA HDs used for the VM Host and the physical PC is the same Seagate SATA model.
So I guess I should either be looking at BIOS settings of the VM Host and/or settings for the Windows 10 VM ?
Are there any more tweaks or statistics I can look at from the ESXi toolset that would help pin-point the issue, now that we've determined it's not the virtual drive driver?
Thanks!
Focus on the hardware level. Find the latency, speed and interface specification of the disks You have.
As an example here You can read the avg latency You can expect and try to get closer to this number tweaking the configuration parameters.
Seagate Barracuda ST1000DM010 Specs - CNET
Add more disks and RAID controller to create RAID group. If You have 2-3 hosts consider to add SSD and implement vSAN solution.
Consider the screenshot as a baseline making any change to the configuration compare it with the baseline to get know if You are improving or not. Make one change at a time and compare with the previous results. ESXTOP is a sufficient tool to observe the results.
If You stack with hardware the You can proceed with VM virtual adapter tweaking. Use PVSCSI as the best in most configurations.
Regards
There was (maybe still exists) a problem with the SATA AHCI driver in ESXi 6.5. So if that problem is not resolved in ESXi 6.7 that might be the cause of the poor disk performance.
Have a look at this thread post Re: Very slow speed on SSD
Appreciate the input bluefirestorm.
If you are referring to the native VM driver issue which caused storage performance issue in ESXi 6.5 (Anthony Spiteri), it was resolved in ESXi 6.5 U1 release.
Thanks!
vmrale,
I don't think I want to start introducing more disks and RAID into the equation as it really is not my goal at this point.
I just really want to understand why a physical PC with lower performance hardware can outperform (albeit a slow file append operation may not qualify as a complete performance failure) a VM Host with higher performance hardware, running one VM (configured as close as I know how to the physical PC):
If you say that ESXi is showing that it is not introducing much latency, then what is causing the performance difference?
Did I miss something in the BIOS setting of the VM Host? Or did I incorrectly install/configure the VM Host or VM, or have I not customized the Windows 10 on the VM for better performance?
Will look into VM virtual adapter tweaking as you suggested - do you have a specific link in mind?
Will look into using Paravirtual SCSI adapters - VMware Knowledge Base nicely documents how to configure an existing Windows boot disk to use PVSCSI adapter.
Regards
Windows 10 Pro v1703 (physical) versus Windows 10 Pro v1709 (VM)
Does both physical and virtual have the Meltdown patch?
Meltdown patch can have a bad performance side effect for both physical and virtual machines that have an I/O intensive operation (which I think a file append would be). The mitigation against this performance side effect requires that the CPU is Haswell generation or newer. You don't indicate what CPUs you have for the physical Windows 10 and the ESXi host. ESXi requires virtual machine version 11 or higher so that Haswell instructions are exposed to the VM. ESXi 6.7 would be using version 14 by default though.
Is the VM virtual disk thick provisioned?
Also try bumping the number of vCPUs in the VM from 1 to 2.
Server's hardware is often designed to last as long as possible, to guarantee a stable work and to give tweaking possibilities. The expensive ones give the higher performance ratios too.
To further investigate performance issues and optimize vDisk performance check if AHCI Controller firmware matches driver version in ESXi 6.7. Maybe hardware's flash needs to be updated or a host's driver.
To optimize write operations in VM create a eager-zeroed thick vDisks.
Regards
Thick provisioned zero-eager disks do a trick. Check the discussion here: thin vs thick provisioning and performance impact
vmrale,
I did a bit more reading of the use of PVSCSI, and VMWare KB article 1010398 states that "PVSCSI adapters are not suitable for DAS environments". So I'm not going to follow up on this for now.
bluefirestorm, Wolken, vmrale:
You guys hit it on the head!
The VM was using thin-provisioned disk! In hindsight, it makes senses - as the data is being written, the overhead must be for formatting the disk on the fly.
I ran the same file append app (75GB transfer) on one VM with eager thick, and one VM with lazy thick:
That's more inline with my expectations of disk performance (versus 15 mins on the slower physical PC, and 37 mins for the thin-provisioned VM).
I also needed to perform above tests after snapshots was taken:
Again, I can imagine the performance being worse, because the snapshot causes the changing/changed data to be written to the snapshot repository, and that means it has to perform formatting on the fly (as in the case of thin-provisioned). Still, the values are better than the 37 mins.
I will to see if there are any snapshot repository advanced configuration parameters that can be used to improve that latency. Barring that, I have the fallback of using SSDD(s) instead of spindle drives. I don't think I have any applications that will hit the disk that hard, but it was a good discovery to learn about these limitations.
Appreciate all your help! Thanks!