VMware Communities
ki81
Contributor
Contributor

VMWare Workstation 16 Pro + Ubuntu 22.04.1 - VM unresponsive with high CPU

Hi All,

I've recently installed VMWare and Ubuntu on a new Windows 10 laptop and I'm finding that after a period of time the VM consistently becomes unresponsive and CPU usage spikes to 100%. If I suspend and awaken the VM it works for a period before the problem reoccurs.

How do I go about solving this?

112 Replies
craigfinnan
Contributor
Contributor

@anders_o 

Adding that cpuid string to the vmx file does seem to work. How did you come up with that? As far as I can work out, setting that bit (bit 24) to 0 in ecx is telling the OS that the bus lock feature in the Linux kernel is not supported. While I am not exactly sure what bus lock does, the fact that a kernel feature is involved is consistent with the VM freeze issue only affecting Linux guests. And since in this mode VMware no longer has direct access the processor hardware but only the Windows 10 hypervisor layer does, it is suggestive that a change to Windows 10 (the September 2022 update) could well have introduced a bug that changes the way that bit is handled when set at 1. It certainly seems that there is a clue here if only there was any incentive on Microsoft's or VMware's part to fix this. Unfortunately, this seems to be sadly lacking. I haven't seen any evidence that either party has even acknowledged this problem.

cff30
0 Kudos
anders_o
Enthusiast
Enthusiast

I got it from VMware support after trying 2-3 other workarounds that they suggested, so at least they now know that this problem exists and how to solve it. 

mboekhold
Contributor
Contributor

Adding another confirmation that setting that cpuid string in the VMX file solves the issue. My VM has been up for hours now. It even survived a <suspend VM> / reboot host / <resume VM> cycle.

0 Kudos
craigfinnan
Contributor
Contributor

Yes, this looks like the real deal. I've been running with this fix all week with Workstation in ULM mode with no issues whatsoever. Finally able to run WSL2 again. I don't know what the ultimate supported fix may end up being, but they apparently now have a handle on the root cause. In the meantime, this workaround is simple to implement and reliable. Thanks to @anders_o for bringing it to the attention of the community.

cff30
0 Kudos
SimonMcGonagall
Contributor
Contributor

Adding the cpuid line solved the problem for me, too. 

0 Kudos
cb831
Enthusiast
Enthusiast

Works for me also running VMWare 16.

@anders_o: It also makes good sense with the observation I reported in one of the first post of this thread:
When I inspect the syslog if find lots of soft lockups for each CPU with increasing seconds. On one VM all 4 CPU's are locked and on the other only CPU 1-2-3 is locked but the result is the same - no response to keyboard and mouse.

 

I gave up in October and migrated my Ubuntu VM to native HyperV - now when it now seems stable, is there an easy way to migrate a HyperVM VM to a VMWare ditto ??

0 Kudos
aarons_gogoair
Contributor
Contributor

virtualHW.version = "10"

This seemed to work for me.  Thank you again!  This has been an ongoing issue for months and vmware's support is so out of touch they wanted me to do very invasive and time consuming troubleshooting techniques.  Everything they suggested did not work!

0 Kudos
3ler
Contributor
Contributor

My setting was at 19 and changed it to 10. The VM stop with the message "The device "nvme" is not supported by current hwversion".

I increment the number till it does not apper. I ended at 13. Now I will check if this improves the Windows 10.

 

 

 

0 Kudos
3ler
Contributor
Contributor

It was not an improvement to change the virtualHW.version. Changed it back to 19. 

0 Kudos
cb831
Enthusiast
Enthusiast

You should go with the hack mentioned by anders_o approx. 10 posts ago

cpuid.1.ecx="----:---0:----:----:----:----:----:----"

That seems like a good mitigation for the issue at hand.

0 Kudos
mikahe
Contributor
Contributor

Hi all!

I'm struggling with nested virtualization and watchdog: BUG: soft lockup
on level 2 guests. Level 0 host is Windows 10.0.19045 (VMware Workstation pro 17.0.2). Level 1 guest is either EL7.9 or EL8.8, like level 2 guests. The tendency to CPU lockups seems much more frequent on EL8.8/.9 though:

./vmware.log:2024-03-11T19:07:26.148Z In(05) vmx Monitor Mode: CPL0
./vmware.log:2024-03-11T19:09:12.204Z In(05) vmx GuestRpcSendTimedOut: message to toolbox-dnd timed out.

Does anyone have any more good suggestions besides those already mentioned in this excellent thread?

0 Kudos
craigfinnan
Contributor
Contributor

Adding the following to the vmx file worked for me. This was posted by @anders_o :

cpuid.1.ecx="----:---0:----:----:----:----:----:----"
cff30
0 Kudos
mikahe
Contributor
Contributor

That looked like a promising tip, but does not seem to help in the nested case. I also used the powercfg and virtualhw.version tweaks

./vmware.log:2024-03-11T19:07:26.140Z In(05) vmx DICT cpuid.1.ecx = "----:---0:----:----:----:----:----:----"

0 Kudos