5 of our RHEL 5.4 32bit 1 vcpu VM's (out of 30 or so) are constantly consuming 90-100% of underlying physical host CPU. In the guest itself the VM is idle.
We have the latest tools. We have tried stopping the tools, moving the VM to another host, restarting the VM's and checking that / is not full on the guests.
ESXTOP confirms the high CPU use seen in vCenter.
Before i open a ticket, is there anything i can try ?
So after a bit of googling i found this. This seems to have helped on the one VM i have done it to so far.
http://www.g-loaded.eu/2009/12/18/high-cpu-usage-centos-guest-virtualbox-vmware/
The timekeeping best practices KB (http://kb.vmware.com/kb/1006427) linked to in the blog article is a great help and tends to fix both high cpu usage as well as weird application behavior and errors on linux VM's, so your solution is probably dead on. It differs between linux kernels though so read it carefully.
If I assume you are running ESX4 and the ESX hosts are 64bit multi-core, why are you only assigning 1vcpu to each VM? I was seeing the same issue until I allocated not less than 2 vpcus per VM. Look at your threads and processes. The multiple CPUs assigned to Linux will solve the problem more effectively than any kernel hack.
All my production VMs, Windows Server or Linux, get not less than 4 vcpus by default.
Brian Nelson
Hang 2 LEDs in the datacenter. The students are coming! The students are coming!
Assigning multiple vCPUs as a default would not fall under best practices. While processor scheduling is much more relaxed than in previous versions you will loose VM density by over committing vCPUs and you do risk reduced performance across the host.
Quite a few assumptions and blantant generalizations which I think need to be addressed:
Assigning multiple vCPUs as a default would not fall under best practices
Best practice? Whose? Many best practices are set by the OEM in order to reduce support costs and complexity.
you will loose VM density by over committing vCPUs
Depends on the VMs, depends on how the resource pool resources are limited, depends on the workload characteristics of the VMs, depends....
risk reduced performance across the host
Again, depends on the VMs, how your workloads use the resources, how you allocate the resources to the VMs.
Brian Nelson
Going waay offtopic but can't really help myself on this one.
Best practice? Whose?
http://www.vmware.com/pdf/Perf_Best_Practices_vSphere4.0.pdf - page 19
Use as few virtual CPUs (vCPUs) as possible. For example, do not use virtual SMP if your application is single-threaded and will not benefit from the additional vCPUs.
Even if some vCPUs are not used, configuring virtual machines with them still imposes some small resource requirements on ESX:
Unused vCPUs still consume timer interrupts.
Maintaining a consistent memory view among multiple vCPUs consumes resources. Some older guest operating systems execute idle loops on unused vCPUs, thereby consuming resources that might otherwise be available for other uses (other virtual machines, the VMkernel, the console, etc.).
The guest scheduler might migrate a single-threaded workload amongst multiple vCPUs, thereby losing cache locality.
Adding to that, HA slot sizes should also be negatively affected causing problems and/or resource waste.
It's pretty clear to me that overcommitting vCPUs IS wasting resources no matter how you arrange them or how your workload is, overcomitting is overcommitting any way you spin it.
Just out of curiousity. If you power off the VM and power it back on, does it resolve the problem?
You can also try divider=10 as a kernel paramter in the RHEL guest.