VMware Communities
silveredge1942
Contributor
Contributor

Win10 guest 'System Idle' eating CPU,vmware-vmx on host 100% cpu load

First I read all the topics about high cpu load and none of them
helped me.
Since I updated my Ubuntu 20.x to 21.x this bug has appeared.
Guest Win10 recognizes it as 'System Idle' eating ~40% cpu time.
On host, htop shows all cores is ~100% loaded.
It's lagging for ~10 seconds. Then 20 seconds of normal
performance. And then another 10 seconds of lags.
It's only lagging inside Win10 guest. On host (as well as in
another VM running Ubuntu guest) everything is fine.
Despite htop shows 100% cpu load.
This bug appears sometimes after some hours after reboot.
Sometimes it takes one day for this bug to appear.
Windows 2019 server is lagging too. Ubuntu is not.

I tried:
1) Tried replacing SSD with brand new one. No luck.
2) Tried disabling/enabling Video acceleration in VM's config.
3) Tried switching Perferences->Memory-> Fit all guest mem, disable
swapping.
4) Tried switching off paging file in Win10 guest and Ubuntu host.
5) Tried splitting VMs to different windows instead of opening
them in tabs.
6) Tried capturing perf when the bug is appearing
(sudo perf record -p <vmware-vmx PID> -g -o sample.perf -- sleep 30):
Samples: 326K of event 'cycles', Event count (approx.): 277697753483
Overhead Command Shared Object Symbol
+ 15.12% vmx-vcpu-2 [kernel.kallsyms] [k] Task_Switch
+ 13.69% vmx-vcpu-1 [kernel.kallsyms] [k] Task_Switch
+ 12.50% vmx-vcpu-0 [kernel.kallsyms] [k] Task_Switch
+ 11.55% vmx-vcpu-7 [kernel.kallsyms] [k] Task_Switch
+ 7.43% vmx-vcpu-11 [kernel.kallsyms] [k] Task_Switch
+ 6.88% vmx-vcpu-3 [kernel.kallsyms] [k] Task_Switch
+ 4.39% vmx-vcpu-8 [kernel.kallsyms] [k] Task_Switch
+ 4.35% vmx-vcpu-4 [kernel.kallsyms] [k] Task_Switch
+ 4.29% vmx-vcpu-9 [kernel.kallsyms] [k] Task_Switch
+ 3.50% vmx-vcpu-5 [kernel.kallsyms] [k] Task_Switch
+ 2.95% vmx-vcpu-10 [kernel.kallsyms] [k] Task_Switch
+ 2.06% vmx-vcpu-6 [kernel.kallsyms] [k] Task_Switch
0.36% vmx-vcpu-2 [kernel.kallsyms] [k] APIC_Read
0.33% vmx-vcpu-1 [kernel.kallsyms] [k] APIC_Read


My computer configuration:
Host: Ubuntu 21.x. Guest that is lagging: Win10.
SSD: brand new one.
CPU: 12 cores Intel (R) Core(TM) i7-10750H CPU @ 2.60GHz
Video card: lspci|grep -i vga
00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD Graphics] (rev 05)
01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 1650 Ti Mobile] (rev a1)
VMWare workstation: the last version

0 Kudos
17 Replies
pumapanzer
Contributor
Contributor

I don't have any solutions or help to share, yet. I am experiencing very similar behavior for a few weeks. I don't know when it started as I rarely use a Windows 10 guest VM for anything, but my Windows Server 2019 and Windows 10 guest VMs are both very slow. None of my Linux VMs have this issue.

This may be the kick in the pants I need to finally migrate the functionality from that Windows 10 VM to Linux/Unix/FreeBSD; however, unfortunately, I have to use Windows at work, so being able to play with Windows Server in my lab is kind of important...

At any rate, if I find a solution or workaround I will do my best to share. 🙂

 

Steps to reproduce:

  1. Open a Windows guest VM
  2. Use top on VMware Workstation host to locate the vmware-vmx process associated with the Windows guest VM (top -p ${vmware-vmx-pid})
  3. Try to open applications in Windows guest VM while monitoring vmware-vmx process
  4. Observe CPU utilization max out to 100% (equal to the number of vCPUs configured, e.g. 4x vCPU = 400% CPU utilization for vmware-vmx process)

 

Troubleshooting steps attempted without resolving issue:

  1. Remove and install VMware Tools in Windows guest VM
  2. Repair install VMware Tools in Windows guest VM
  3. Restart Windows guest VM
  4. Restart host
  5. Monitor Windows guest VM logs for any error messages

 

Host system:

  • CPU: Ryzen 7 2700X
  • RAM: 32GB RAM
  • GPU: GeForce RTX 2060 4GB
  • Disks:
    • 1TB NVMe ADATA SX8200PNP (all guest VM vDisks run on a LVM on this NVMe)
    • 1TB SSD Samsung SSD 860
  • Hypervisor: VMware Workstation Pro 16 (16.1.2 build-17966106)
  • OS:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Linuxmint
Description: Linux Mint 20
Release: 20
Codename: ulyana

$ uname -a
Linux mint-ws 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

 

Guest Windows VM #1:

  • OS: Windows 10 Pro (21H1)(19043.1202)
  • vCPU: 1 CPU, 4 cores
  • vRAM: 6 GiB
  • vDisk: 100GB SCSI Hard Disk

Guest Windows VM #2:

  • OS: Windows Server 2019 Datacenter (1809)
  • vCPU: 1 CPU, 4 cores
  • vRAM: 4 GiB
  • vDisk: 60GB NVMe Hard Disk

 

2021-09-12 14:55 CDT - corrected and updated system specifications

0 Kudos
ajgringo619
Hot Shot
Hot Shot

This is very interesting, as your hardware/software specs are almost identical to mine:

Host

LM 20.2 w/kernel 5.8.0.34 (just upgraded today)

GTX 1070, Ryzen 7 2700X w/32 GB RAM

Guest 

(8) vCPUs, 8 GB RAM, 150 GB NVMe (* this is the only major difference)

 

After logging on to the VM, I sat and watched its process stay steady between 45-50% CPU usage, so 7% per CPU, and 2 GB RAM. Other than upgrading the kernel, I'd consider switching the drive type.

pumapanzer
Contributor
Contributor

@ajgringo619,

Thanks for taking a look and for the suggestion. 🙂

1. I am on a newer Linux kernel right now (5.11.0-27-generic) but an older release of Mint (20). I may give this some more thought but no sure if I am ready for an upgrade right now

2. I switched from SCSI to NVMe using the following steps (the solution is for Fusion, but the steps are very similar): https://communities.vmware.com/t5/VMware-Fusion-Discussions/Fusion-11-Switch-from-SCSI-to-NVMe-for-W...

After I updated the vDisk type from SCSI to NVMe I reinstalled the VMware Tools and rebooted the Windows guest VM. So far so good. I'll keep an eye on the performance and let you know if this helped. 🙂

Take care!

 

PS - I forgot to mention, the reason why I think switching to the NVMe disk type will help is because my Windows Server 2019 VM isn't as slow as my Windows 10 guest VM. After double checking I noticed that my Windows Server 2019 guest VM has a NVMe disk type. Fingers crossed.

0 Kudos
ajgringo619
Hot Shot
Hot Shot

Hope it was that simple. I would also recommend upgrading your kernel; you're (1) revision behind (now on 5.11.0.34).
0 Kudos
silveredge1942
Contributor
Contributor

Thank you for sharing. Let me update my information.

Linux VM became buggy too. Interesting thing #1: is that its bugging VERY rarely. But its obviously bugging too.

I see 20 seconds of normal work and then 20 seconds of hard lagging where even GUI almost doesn't respond.

Then 20 seconds of normal again. And so on. This linux VM (ubuntu) did lag only ONE time within these week.

Only ONE time. However, windows VMs lag after 1-3 hours of work. Another Interesting fact #2 is that the more VMs opened

the less time you work before lags begin. If I work with two VMs (linux and win10) I can work for approximately

a day. If I open Linux + Two Windows 2019 Server VMs, it's about 1-3 hours. Then one of Windows Server 2019

VMs start lagging (the one I work on).

And another interesting thing #3 is that there is no process eats CPU time on VMs. On VMs, there is ABSOLUTELY good CPU time counters. Like nothing is lagging. No process eats time. On windows, I tried Resource Monitor (built in utility from M$), which is showing that user threads consume about 100% of CPU time. However I couldn't find the process/thread that is doing it. Very strange.

I will do any kind of profiling any system of mine if the VMWare developers ask me to.

I can try profiling vmmon/vmnet drivers on host OS (Ubuntu Linux) if I get some free time to do this.

It seems it's not so easy as profiling usermode process with perf utility.

I'm pretty sure this is either a bug in VMWare workstation or Ubuntu 21. The thing is that I there were both

VMWare AND Ubuntu patches at one time. I couldn't recognize what is the problem then.

And finally another Interesting thing #4: if I switch the memory option (Always fit VMs in memory switch to Allow some VMs memory to be swapped), this entire bug begins to appear much more often. And like I already said,

the only thing that helps is host REBOOT.

Don't you guys know if the VMWare developers/admins do read these topics?

I hope so. Because it's VERY hard to work that way. I'm very desperate and sad these weeks.

 

 

Mann
Contributor
Contributor

Well I agree, it must have been some bug in Vmware WKS (Pro) v16.1.2, as I recently started getting unpredictable vmware-vmx CPU ups and downs when my Windows10 guest is running, only after I upgraded Vmware Wks to 16.1.2! Before that I never experienced such performance issues before (the same host OS (Ubuntu, i9, 64 GB RAM, 2 TB NVMe...) and the same guest Win10 64-bit, 32 GB RAM).

I'm posting this to encourage others to come up if they have the same situation so we can gram some attention from Vmware.

0 Kudos
pumapanzer
Contributor
Contributor

I am glad you are sharing your experience; however, I wonder if there is a better way to report bugs than here in this community thread? I kind of overlooked that I hijacked the OPs thread, but I think it's mostly forgivable since we're all seeing similar issues on Linux Host OSs.

I am still seeing the same performance issues as others in this thread even after switching to an NVMe vhd type for my Windows 10 Pro guest.

==================================================

As an aside, I am curious how much market share there is between Windows, Linux, and macOS with Workstation Pro and Fusion, respectively? Could Linux be an "unpopular" platform with regard to VMware Workstation Pro, and explain why we are experiencing this issue with guest VM performance?

I am using a VMUG Advantage subscription because at one time I was interested in VMware certifications. In the last year or two I have seen what I consider a loss in quality that shouldn't be with a paid for desktop virtualization platform such as Workstation Pro and Fusion. I can imagine it's difficult for VMware developers with Windows 11 feeling like yet another Windows Me/Vista, and seeing all the woes with Big Sur regarding DHCP for virtual networks. That being said, whether it's difficult, or not, paying customers are paying customers, until they are not.

Perhaps this is yet another kick in the pants I need to dig deeper into Qemu/KVM/VirtualBox? Being a Windows user primarily until about a year ago (switched to Linux on the desktop and FreeBSD for my home server) I was spoiled, and it made my progress in learning about computers suffer. Perhaps I need to make the switch to a new desktop hypervisor, too? For example, GNS3 is amazing for being free and open source, especially running on Linux where you don't need the GNS3 VM and can run VirtualBox/Qemu VMs for network devices and virtual PCs. And I digress...

0 Kudos
ijbgreen
Contributor
Contributor

I have the same issue.

I recreate several times VM with Windows 10 Pro, over Ubuntu 20.04 with kernel 5.11.0-37-generic.

I create the VM whit NVMe by default and the issue appears randomly.

What we need to report this issue?

 

pumapanzer
Contributor
Contributor

I found the following support resources. I am going to submit a ticket and see what happens.

 

https://www.vmware.com/support/services.html

> scroll down then click Others tab under Services

> click link under 'Complimentary Support' which opens a PDF

> click "My VMware" link in the PDF, login, and reach https://customerconnect.vmware.com/dashboard

> click Support tab then 'Get Support' from the menu

> scroll down then click 'Request Support' under 'Technical Support'

Amazing, the multiple redirects, and resource formats, to get support in 2021, LoL.

 

I'll update here if I get a response. Best of luck!

 

0 Kudos
pumapanzer
Contributor
Contributor

There was a step that I missed before opening a support case (see: my previous reply where a PDF is mentioned in the steps to get support):

 

To receive access to technical support, you must first register your product.

 

It's either register a product (more on that in a sec) and then *possibly* be able to submit a ticket for support, or I call for phone support. No, thanks. With all the jumping through hoops (multiple incoherent redirects, including a PDF), this company's support resources reek of horrible quality development teams, and I don't want to waste more time on likely an equally horrible IVR.

 

As I mentioned previously, I am a VMUG Advantage subscriber. When I attempted to register a product using the license key provided by VMUG, I advance to the next page after entering the license key, and all I see is a "Continue" button. There's no way to view my registered products. There's no success message. There's no error message. Again, reeking of horrible quality development teams for VMware support services.

 

I truly hope all of you find the support and solutions you need. I've already spent too much time on this endeavor and will be moving on to another virtualization platform. Take care and stay safe!

 

0 Kudos
silveredge1942
Contributor
Contributor

Oh yeah. Exactly. I could use KVM at first place! Moving to KVM...

0 Kudos
CraigMinihan
Contributor
Contributor

I see the same problem on Debian 11 with Workstation 16.2.1 build-18811642.

If I'm lucky I can use a Windows 10 VM for a day or two without issue. Other times the CPU usage goes to 100% on all VM cores for a few minutes. It repeats every few minutes and usually ends with with the VM unusable.

The host reports the high load process as vmware-vmx but the VM itself usually sees no CPU usage.

The Linux kernel is 5.10.0-10-amd64 on an i7 12700 Alder Lake.

0 Kudos
boarduser44
Contributor
Contributor

I can confirm the same behaviour on a Debian host (Kernel 5.10.0-11-amd64) and a Windows 10 guest. I use vmware workstation pro, version 6.2.1, build-18811642.

The four virtual CPUs spike randomly to 100% making the guest VM unuseable. The VM itself reports no unusual CPU load.
After about ten to thirty seconds the VM returns to normal state. I cannot reproduce this behaviour. It makes no difference whether the VM has some work to do or is idle. Some days this behaviour does not occur at all, on the other day I have the VM unusable up to ten times within an hour.

I have also two Linux VMs on the same host, which never made any problems.

0 Kudos
Paul_Kierstead
Contributor
Contributor

Same situation here, started very suddenly, no changes that I’m aware of with guests that worked fine yesterday morning. Win10 guests, of various configurations (things like 3d display both on and off), on different drives, all peg out one core at 100% even when idle (and task manager in guest shows negligible cpu).  All are extremely jerky and unresponsive, with small bursts sometimes of responsiveness. All were working well previously. Linux guests do not exhibit this. Machine is very well resourced and running miles below its limits.

Host: 3970X (is that part of the pattern, AMD?), Kernel 5.13.0-27-generic. VMWare 162.1 build-18811642

Guest is various flavours of win10 in varying configs, there seems to be no difference. 

I had observed this intermittently for short periods before, but it is solid now.  Now sure what I’m going to do, I can’t work as this has stopped work dead, the vms are unusable.

One note: It pegs a cpu for a bit, say 10s up to a couple of minutes, then switches CPU’s, tending to indicate it might have stopped and made a new thread, like it is retrying something…. But I’m just grasping at straws.

 

0 Kudos
CraigMinihan
Contributor
Contributor

I'm running Debian 11 with the BPO kernel version 5.15.0-0.bpo.2 and things seem more stable than under 5.10.

Today for the first time in a week my Win 10 guest became highly unstable with the vmware-vmx process running 100% again. A shutdown and restart of the VM resolved the problem for a minute or so but it went 100% again pulsing for ~10s dropping for maybe 1-2s and then back to 100%. However this time I got a log out of it which I've attached.

The real action happens at 16:23:04Z with the signal 11 and a stack trace which starts at libvmwarebase.so. This sequence happened while I was actively using the Windows guest. I'm still not clear what triggers this condition.

0 Kudos
ijbgreen
Contributor
Contributor

I can resolve this issue!!🙌. I found this thread https://communities.vmware.com/t5/VMware-Workstation-Pro/VMWare-workstation-in-a-fistfight-with-Linu...

I related the issue with a high CPU consumption of CPU and a exhaustive use of a kcompactd process i used the recommendation below:

# Disable kcompactd0 to work around a conflict with VMWare Workstation.
vm.compaction_proactiveness=0

I apply this only for the session that i'm using with:

 $ sudo sysctl vm.compaction_proactiveness=0

 

And it works. I has tested on host Ubuntu 20.04.4 with kernel 5.13.0-37-generic, guest machines Windows 10 and Windows 11 both of them are Windows Professional versions.

A warning: This flag disable the proactive memory compaction from kernel. I only use this flag when i go to use a VM with Windows, in other case i return to his defaul value that is 20 (because i don't know the side effects over the all system memory. I was using around five maybe six hours continuously without side effects but nobody knows). Use wisely.

splashd
Contributor
Contributor

You're a lifesaver. This has been bugging me for months. I am running my corporate Win 10 build as a VM and I assumed it was some big-brother-ware  clobbering CPU cycles. I was doing keyboard double-dutch--trying to do quick typing between lock-ups.

 

Thanks for this find.

0 Kudos