VMware Cloud Community
fernandomm2
Enthusiast
Enthusiast

Intermittent 100% sys CPU usage in VM

I have a server with 512GB of RAM and 2x Intel Xeon E5-2697 which has 14 cores. With hyperthreading, I get 56 vCPUs available in ESXi 6.

I'm running 2 VMs with Centos 7. One with 500GB of RAM and 56 vCPUs ( main server ) and another with 4GB of RAM and 2 vCPUs ( secondary server ).

Main server load is usually 4-5 but it has load spikes and goes above 100 ( usually starts after 7 days of uptime and keeps happening randomly  ). I installed munin and kept monitoring the server. I noticed that when the issue happens, all processes get stuck with 100% of sys CPU usage meaning that the OS is doing/waiting something on all cores.

Could this be caused by the elevated resources that this VM has? Or is ESXi 6 supposed to support this?

I'm asking this because I have an identical server ( really, the entire hardware/software is the same ) but it's running Centos 7 bare metal. And this issue doesn't happen with it.

Note that when this issue happens, the secondary server keeps running without issues.

2 Replies
HawkieMan
Enthusiast
Enthusiast

Dont forget that when you commit the max capacity you are in fatc creating a problem, because your 56vcpu vm will not get access to cpu when the 2vcpu machine is active. The reason is that vmware will look for a 56vcpu slot, but will only find a 54 slot opening to run on. So it is in a wait state until the other VM uses other resources instead of cpu. And you should also remember that hyperthreading doesnt give you more real CPUs, it just improve timing when there is enough to do, so in this setup your VM would propably run better getting 14vcpus. Dont forget wait state monitoring.

fernandomm2
Enthusiast
Enthusiast

Thanks for the reply.

I also tried to set 54 vCPUs for the main server and 2 vCPUs for the secondary server sometime ago, but it didn't made any difference. The error happened after a few days of uptime.

I'm now trying with 24 vCPUs ( less than the number of real cores, ignoring hyperthreading ). Let's see if this helps somehow.

Just a note: the bare metal CentOS server shows/use 56 CPUs and works fine with it.

0 Kudos