VMware Communities
emusic
Enthusiast
Enthusiast

WS 15.0.2 causes much higher kernel latency values in Windows VMs than 12.5.8/12.5.9

I used WS Pro 12.5.x (up to 12.5.8) on Win7 host for two years, running various Windows guests from Win2000 to Win10. For all guests, kernel latencies are acceptable. Of course, latencies are higher in Win10 than in Win7 or earlier, but 12.5.x runs all versions of Windows smooth enough to have glitch-free audio.

Now I tried to upgrade to 15.0.2 and noticed that kernel latencies became much higher in all Windows guests. Under. 12.5.8 or older Workstation versions, I never heard glitches in Win7 audio, and heard glitches in Win10 audio rarely. Under 15.0.2, there are periodical glitches in Win7 audio, and regular glitches in Win10 audio.

WS 12.5.8

Win7 guest, LatencyMon 6.70:

Highest ISR routine execution time (µs):          834,461283
ISR count (execution time <250 µs):               28866
ISR count (execution time 250-500 µs):            0
ISR count (execution time 500-999 µs):            16
ISR count (execution time 1000-1999 µs):          0
ISR count (execution time 2000-3999 µs):          0
ISR count (execution time >=4000 µs):             0

Highest DPC routine execution time (µs):          1007,980088
DPC count (execution time <250 µs):               63861
DPC count (execution time 250-500 µs):            0
DPC count (execution time 500-999 µs):            32
DPC count (execution time 1000-1999 µs):          1
DPC count (execution time 2000-3999 µs):          0
DPC count (execution time >=4000 µs):             0

LatencyMon window usually looks "all green" or "mostly green with a little red".

Typical DPCLat window in Win7 guest under 12.5.8:

ws12.5.8-dpclat-1.png

Win10 guest, LatencyMon:

Highest ISR routine execution time (µs):          221.604720
ISR count (execution time <250 µs):               5624
ISR count (execution time 250-500 µs):            0
ISR count (execution time 500-999 µs):            0
ISR count (execution time 1000-1999 µs):          0
ISR count (execution time 2000-3999 µs):          0
ISR count (execution time >=4000 µs):             0

Highest DPC routine execution time (µs):          1327.167404
DPC count (execution time <250 µs):               16879
DPC count (execution time 250-500 µs):            0
DPC count (execution time 500-999 µs):            3
DPC count (execution time 1000-1999 µs):          1
DPC count (execution time 2000-3999 µs):          0
DPC count (execution time >=4000 µs):             0

LatencyMon window usually looks "all green" or "mostly green with a little red".

Typical DPCLat window in Win10 guest under 12.5.8:

ws12.5.9-dpclat-2.png

WS 15.0.2

Win7 guest, LatencyMon:

Highest ISR routine execution time (µs):          12080,320428
ISR count (execution time <250 µs):               30429
ISR count (execution time 250-500 µs):            0
ISR count (execution time 500-999 µs):            18
ISR count (execution time 1000-1999 µs):          0
ISR count (execution time 2000-3999 µs):          0
ISR count (execution time >=4000 µs):             0

Highest DPC routine execution time (µs):          34408,513274
DPC count (execution time <250 µs):               60279
DPC count (execution time 250-500 µs):            0
DPC count (execution time 500-999 µs):            42
DPC count (execution time 1000-1999 µs):          12
DPC count (execution time 2000-3999 µs):          2
DPC count (execution time >=4000 µs):             0

LatencyMon window usually looks "distincly red".

Typical DPCLat window in Win7 guest under 15.0.2:

ws15.0.2-dpclat-2.png

Win10 guest, LatencyMon:

Highest ISR routine execution time (µs):          698.259587
ISR count (execution time <250 µs):               4198
ISR count (execution time 250-500 µs):            0
ISR count (execution time 500-999 µs):            1
ISR count (execution time 1000-1999 µs):          0
ISR count (execution time 2000-3999 µs):          0
ISR count (execution time >=4000 µs):             0

Highest DPC routine execution time (µs):          177541.939528
DPC count (execution time <250 µs):               16109
DPC count (execution time 250-500 µs):            0
DPC count (execution time 500-999 µs):            2
DPC count (execution time 1000-1999 µs):          1
DPC count (execution time 2000-3999 µs):          0
DPC count (execution time >=4000 µs):             0

LatencyMon window usually looks "mostly red".

Typical DPCLat window in Win10 guest under 15.0.2:

ws15.0.2-dpclat-2.png

Disabling USB and network adapters in VMs makes the situation slightly better, but 15.0.2 with USB/network disabled still works much worse than 12.5.8 with USB/network enabled.

All guests are 64-bit, paging is disabled at all.

Host system is MSI GT72S laptop (Intel CM236 chipset, i7-6820HK CPU, 16GB RAM), Win7, paging is disabled, all well-known real-time performance optimizations applied. LatencyMon shows very low latencies in host system. On host system, all audio processing works works quite stable.

All tests were performed in "empty" host system - just booted, with no applications active (even tray-minimized), just usual background services.

Unfortunately, WS 15 appears to be completely unusable for real-time audio processes (I use it for debugging my audio applications and drivers). I had to uninstall it and revert back to 12.5.8.

Why WS 15 introduces so high latencies in the guests? Are there some measures I could take to lower them?

Reply
0 Kudos
3 Replies
bluefirestorm
Champion
Champion

You have indicated you used 12.5.8 as a basis for comparison against 15.0.2.

Version 12.5.8 does not expose the Spectre microcode to the VM while 15.0.2 does. The Spectre patches are exposed to the VM with 12.5.9 (but does not include the patch for SSBD).

You could turn off the Spectre microcode to the VM by adding the following lines to the vmx configuration file of the 15.0.2 VM to make it as close as possible to 12.5.8.

featMask.vm.cpuid.stibp = "Val:0"

featMask.vm.cpuid.ibrs = "Val:0"

featMask.vm.cpuid.ibpb = "Val:0"

featMask.vm.cpuid.ssbd = "Val:0"

For Meltdown patch, it is dependent only on the OS patch, and does not have any dependency with VMware.

For Windows 7, Meltdown patch can hit I/O intensive operations as Windows 7 does not make use of the INVPCID instruction. You can disable the Meltdown patch at the Windows OS level.

Reply
0 Kudos
emusic
Enthusiast
Enthusiast

Thank you for the settings to disable Spectre patch, I tried to find them earlier, but with no success.

I know that Spectre patch is applied since 12.5.9. I tested 12.5.9 a bit (without disabling the microcode patch), and found it just slightly worse than 12.5.8. For example, typical DPCLat graph with 12.5.9 and Win7 guest looks like that:

ws12.5.9-dpclat-1.png

Results are almost the same as in 12.5.8.

I added microcode patch disable settings to the VMX, and 15.0.2/Win7 still produces the following:

ws15.0.2-dpclat-7 wo patch.png

With Win10 guest, results are the same: almost no difference between 12.5.8 and 12.5.9, even with the patch enabled, but big differences between 12.5.x and 15.0.2, even with the patch disabled.

The main problem is the sudden huge delay spikes, not the slightly longer delays.

The most noticeable effect can be achieved by disabling both network and USB adapters in the VM. Since 12.5.x works nice even with these adapters enabled, and 15.0.2 works badly with both native and 12.5.8 versions of VMware Tools, I suppose that the problem is in USB/network hardware emulation of 15.0.2.

Reply
0 Kudos
emusic
Enthusiast
Enthusiast

I found a possible cause of the problem. In 15.0.x, some disk-related processes (for example, chkdsk) in Windows guests perform much faster than in 12.5.x.

In 12.5.x, chkdsk on 15GB virtual SCSI disk (mapped to a VMDK file created on the mechanical host SATA HDD) always completes in 11 s, regardless of first or successive executions. DPC latency tests (for example, DpcLat) ran at the same time, show no noticeable DPC latency increase. Average DPC latency is less than 1 ms in Windows 7 guest.

In 15.0.x, chkdsk takes about 11 s for the first execution, but each successive execution takes no more than 3 s. In that time, DPC latencies are increased up to dozens and even hundreds of milliseconds.

This obviously means that 15.0.x uses a kind of virtual disk buffering not used in 12.5.x, and buffered data are processed synchronously, affecting guest ISR/DPC latencies.

Unfortunately, virtual disk buffering is almost not documented. I found some suggestions to use "hard-disk.useUnbuffered",

"aiomgr.buffered", "aiomgr.unbuf" or "aiomgr.simple", but I'm not sure where to use them (config.ini or the .vmx). Tried to add

"aiomgr.simple = generic" and "aiomgr.unbuf = true" to the .vmx file but chkdsk acceleration is still present.

Does somebody know how to disable such aggressive virtual disk buffering in 15.0.x, forcing it to behave like 12.5.x?

Reply
0 Kudos