linutic
Contributor
Contributor

VMWare workstation in a fistfight with Linux Memory Compactor kcompact

I run VMWare Workstation Pro 16.2.9 on a AMD 3900X with 64 GB memory, and an SSD running Ubuntu Linux 5.11.0-38-generic.

Starting about 1 year ago VMWare has been fighting with the Linux memory compactor kcompct.  When hostilities begin (Windows 10 with 4 cores at 16 GB configured) VMWare freezes with its 4 cores at 100% while kcopact runs 100% CPU.  After a few seconds, the combatants restrain themselves for about 30 seconds, then begin again.  This doesn't happen every day, but when it does happen the two combatants never seem to tire of this annoying behavior.  A Linux reboot clears up the problem, for a time.

This behavior has been mentioned on a number of forums, but I have not seen a discussion on VMWare.  (which is why I am here)   Initially the problem could be resolved by turning off Transparent Hugepages.   Then after a time, this was no longer necessary, and the two behaved themselves.  About 3 months ago, hostilities resumed, but now turning off Transparent Hugepages has little or no effect.

I'm guessing that VMWare has it's own algorithms to collect 4K pages into hugepages, which is otherwise the job of kcompact.   Possibly the two are squabbling over a futex or other exclusive resource, or maybe a pair of them.

Any chance VMWare guys could look at this problem and break up the fight?

0 Kudos
12 Replies
simmonsjw
Contributor
Contributor

I've got the same problem on a Fedora 34 system.  All kernels starting with 5.12 show the problem, going back to 5.11.21-300 and the issue pretty much disappears.  When this happens, kcompactd0 goes to 100% cpu use and the virtual is unresponsive (you can move the cursor but not get any other response).  At the same time, the vmware process for the virtual goes to basicallly 100% * cpu cores (i.e. if you have 2 cpu cores in the virtual it will sit at 200% or so).  When kcompactd0 goes back below 100%, which can sometime take minutes, the virtual comes back but it often happens again fairly quickly.  Strangely, at times with the newer kernels you can run for quite a while without the issue.

Updating to 16.1.2, then 16.2.0, and 16.2.1 didn't help, along with using patches like https://github.com/mkubecek/vmware-host-modules/archive/workstation-16.2.1.tar.gz.

I'd like to update to Fedora 35 but suspect I won't be able to use the older kernel.  I'm considering converting the virtual to kvm, but I'd prefer to stay with VMware Workstation.

As mentioned, turning off hugepages did not help, either.

Anyone have any ideas?

Thanks, Jim

wila
Immortal
Immortal

Hi,

Please see the following for a workaround:
https://gist.github.com/2E0PGS/2560d054819843d1e6da76ae57378989

This was found by the @akornilov82 who posted this thread:
https://communities.vmware.com/t5/VMware-Workstation-Player/Over-CPU-usage-in-idle/m-p/2878569

 

Hope this helps,
--
Wil

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
simmonsjw
Contributor
Contributor

Thanks, but I've tried it and it doesn't help with the latest kernels in Fedora 34 (I've tried it through all of them up to the current 5.15.4-101.fc34.x86_64).  The 5.15.4-101.fc34.x86_64 is the last kernel vmware has worked reliably with on this system.  I don't believe it is actual memory - the system has 6G of memory with 2G (and later 3G) allocated to the windows 10 virtual.  I've also tried changing the vmware preference for reserved memory to 4096MB but that didn't help.

I'm going to try converting the virtual to a kvm virtual and see if it fixes it.  This is an old system with a "Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz" CPU, which is currently one of the oldest supported by vmware.  It is possible there is something with it that is causing the issue.

Jim

linutic
Contributor
Contributor

I received a solution on another thread, and it's working for me.  All credit to neves-0 on Github. See:

https://gist.github.com/2E0PGS/2560d054819843d1e6da76ae57378989

As root, execute the following command:

echo 0 > /proc/sys/vm/compaction_proactiveness

So far, this fix seems to completely disable kcompactd0.  There may be side effects, but so far I haven't noticed any.  After 12 hours, on Ubuntu 20.04 with VMware Workstation 16.2.1 build-18811642 running a Windows 10 vm, kcompactd0 has accumulated zero cpu time:

# ps -ef | grep kcompactd0
root         165       2  0 Dec04 ?        00:00:00 [kcompactd0]
gene       24364   20919  0 10:05 pts/0    00:00:00 grep kcompactd0

If you would like to apply the fix every time you boot, add the following line to /etc/sysctl.conf:

# Disable kcompactd0 to work around a conflict with VMWare Workstation.
vm.compaction_proactiveness=0

 

linutic
Contributor
Contributor

I've tried disabling transparent hugepages.  It used to work, but no longer does.  See the link below for an alternative fix:

https://gist.github.com/2E0PGS/2560d054819843d1e6da76ae57378989

I have been strugging with this problem for *years*.   No more.

0 Kudos
toesterdahl
Contributor
Contributor

Hello,

I have this problem too, and the problem seem to be hitting systems running Ubuntu as host somewhat reliably. I find it would be in the intrest of vmware to finda permanent solution with Ubuntu, alternative upstream with kernel developers. For every user that find the correct tweak there will be plenty that just give up.

0 Kudos
kmahyyg
Contributor
Contributor

# Disable kcompactd0 to work around a conflict with VMWare Workstation.
vm.compaction_proactiveness=0

 

This solution offered by another guy works fine here. Transparent hugepage is no need to be modified for this problem.

This problem became much more obvious after upgrade Linux kernel to 5.16+ as I'm running Arch Linux.

VMWare official, plz, fix ASAP.

ijbgreen
Contributor
Contributor

This works for me!!. Ubuntu 20.0.4 kernel 5.13.0-37-generic at this moment without side effects. Tested on Windows 10 and Windows 11 guests.!!! I only apply:

# Disable kcompactd0 to work around a conflict with VMWare Workstation.
vm.compaction_proactiveness=0

Using:

sudo sysctl vm.compaction_proactiveness=0

 Only when i will go to start a Windows VM. Because for me only happens with Windows Guests.

0 Kudos
reddavid
Contributor
Contributor

Excellent info. Been dealing with this for a couple years on multiple kernels/distros. Work around was adding a cron job that fired every 10 min to drop caches which worked "fine" until this AM when i was compacting a large VM on a USB attached disk. Long process multiple kcompatd runs turned my running VM into a useless mess.

 

 

0 Kudos
jiaxinslee
Contributor
Contributor

Comfirm that win10 guest freezes and kcompactd0 kicks in with compaction_proactiveness=0.

Tested with:
Ubuntu 22.04 kernel 5.15.35 / 5.16.20
VMWare 16.2.3 / 16.2.0 / 16.1.2
Win10 Guest, 4GB / 6GB memory allocated

0 Kudos
jvasa
Contributor
Contributor

I'm on Fedora 36, and it's good on the 5.17 kernel, but on 5.18 the kcompactd0 constantly freezes the Win10 VM.

My settings:
echo never | tee /sys/kernel/mm/transparent_hugepage/defrag
echo 0 | tee /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
echo 0 | tee /proc/sys/vm/compaction_proactiveness

Specs:
Fedora 36
VMWare Workstatoin 16.2.3 with mkubecek vmware-host-modules
Windows 10 VM: 16GB / 32GB allocated
CPU : Intel i7-10700

Working:
Kernel: 5.17.14 (fedora rpm)

Not Working
Kernel: 5.18.5 (fedora rpm)

I am reverted back to 5.17 kernel, as atleast for me it gives me issues running 5.18.

Tags (1)
0 Kudos
nayr1
Contributor
Contributor

OS: Pop!_OS 22.04 LTS x86_64
Host: 20QVS0FP00 ThinkPad X1 Extreme 2nd
Kernel: 5.17.15-76051715-generic
CPU: Intel i9-9880H (16) @ 4.800GHz
GPU: NVIDIA GeForce GTX 1650 Mobile / Max-Q
Memory: 23582MiB / 31783MiB

Disabling kcompactd0 seems to have solved this problem for me also!

0 Kudos