VMware Cloud Community
andvm
Hot Shot
Hot Shot

ESXi CPU Usage

Hi,

Is the attached a worrying and overloaded Server in terms of CPU usage? (Real time and Last day)

Some cores are at 100% whilst others are low.

Trying to get a best estimate of current usage (via vCenter) so when I place new workload I know if the server can handle more load or if better to leave as is.

Let mw know if there are better ways to get these stats (such as via command line but think those would be current stats rather than statistical)

Thanks

andvm_0-1616427888732.png

andvm_1-1616427919882.png

 

Reply
0 Kudos
3 Replies
vbondzio
VMware Employee
VMware Employee

The best way is to look at Host Usage, if that is at 100, look at Host Utilization. There is no need to look at individual PCPUs unless you are troubleshooting something, sometimes one NUMA node is busier than the other(s) because e.g. more VMs are making use of an IO context / world that is associated with a device attached to that node, that is usually not a reason for concern. You can also ;ook at Host level "CPU Latency" (incl. core / SMT "contention" and a bunch more) and Host level Readiness if you want to figure out whether there is contention despite low usage.

A lot of what you are asking is being done by DRS, because load is dynamic and placing something "well" might not hold up for very long.

Reply
0 Kudos
andvm
Hot Shot
Hot Shot

yes in fact average CPU usage is around 22% for both the Real-Time, Day and Week views so a bit less worrying.

DRS is not available due to the requirement of a specific license.

I take it would still be fine if Average was around 40% and Maximum around 80% right?

Does a maximum of less than 100% mean a proof that VM's have never experienced CPU contention? (or is not that simple)

Thanks

Reply
0 Kudos
vbondzio
VMware Employee
VMware Employee

You should be OK, ESXi will take care to minimize scheduling and core (SMT/HT) contention based on many factors. There are some instances where it might accept some minor contention to benefit locality but I wouldn't be concerned.

In theory, you should have very little "CPU Latency" (the approx. reduction in throughput compared to a single core at nominal frequency) below 100% Usage and very little "Ready" (scheduling contention) below 100% Utilization. Should because that would indeed be very simplified 🙂 ... and basically assuming a uniform "scheduling domain" and that no more threads need to wake up at the same time than available PCPUs and that there is no such thing as interrupts.

To make more sense of the very chaotic per PCPU usage, the best way to visualize is to export to CSV and use e.g. perfmon to display as stacked area with the range set to e.g. number of PCPUs * 100.

You might want to watch this here (from the 24th minute if you are short on time), it's about ready time but goes into some other relevant concepts too: https://www.vmworld.com/en/video-library/video-landing.html?sessionid=1589484575728001Zb6J

Reply
0 Kudos