Introduction
CPU load is generated by the guest and its applications as well ESX Server as it provides a virtual interface to the hardware. While the work performed by the host does result in some increase in load, the great majority of processing is due to the applications in the VM. A solid understanding of the workload profile regardless of the virtual environment can assist CPU analysis.
Check Utilization
Invoke esxtop. By default, it should show CPU utilization but pressing ‘c' will ensure this data is being displayed. The following figure shows example data produced on a test system.
Observe the following:
- The PCPU(%) line in the header shows utilization for the processor(s) by core and in total. The comma-delimited data first displayed shows core utilization followed by "used total" which averages utilization of all cores.
- The LCPU(%) line shows the percentage of CPU utilization per logical CPU. The percentages for the logical CPUs belonging to a package add up to 100 percent. This line appears only if hyperthreading is present and enabled.
- The CCPU(%) line shows the percentages of total CPU time as reported by the ESX Server service console. Use of any third party software, such as management agents and backup agents, inside the service console, may result in high CCPU(%) number.
- There is an idle world running whose %USED entry displays the amount of CPU cycles that remain unused. If the idle world is reported at less than 100% utilization then only a fraction of one physical core remains for additional work. As this number can max out at many hundreds of percentages (100% for each core) small numbers here represent heavily loaded systems.
- Check the utilization (%USED) of the interesting VMs. The VMs are reported here with the names specified at their time of creation. Like the idle row, utilization for each VM can exceed 100%. A VM that was provided two vCPUs, as an example, can max out at 200% CPU utilization.
- Expand the group data for the VM that is most interesting. This is done by hitting ‘e' and then entering the group ID number (GID) for the VM. The figure below contains a CPU-expanded version for GID "30" in the previous figure. Once expanded, esxtop will expand rows and provide counter data for every world in the group. This includes:
- vmmX: For each vCPU provided to the VM, a virtual machine monitor (VMM) world is displayed. This world will perform the majority of the work required to execute and virtualize the guest code (OS, application, and hypervisor).
- vcpu-X: A vcpu-X world is created to assist the VMM world for each vCPU. Primarily this work revolves around the virtualization of the IO devices.
- mks: Mouse, keyboard, and screen interrupt servicing.
- vmware-vmx: The VMX worlds assist in maintenance and communications with other worlds and should not represent a material portion of the group utilization.
Evaluate the Data and Correct the System
The general flow for evaluation starts by considering the system's load. Is the system overloaded with too many VMs? Is the guest using all of its vCPUs and simply requires more or faster processors? Are all guests waiting for IO? For example:
- Check the PCPU(%) line to see if all cores' utilization is near 100%. In this case the system is saturated. If multiple VMs are competing for the CPUs, try to reduce the VMs on the system or find other means of decreasing the load on the system. See "CPU Saturation of Host" below.
- See if the PCPU(%) line shows an unequal load across processor cores with some at saturation and some remaining near idle. This would indicate applications within the VM utilizing all of the cores provided to them. Increase its vCPU count, if possible, and verify that the guest is making use of the additional cores. If the application supports horizontal scalability, you may run multiple VMs to use the additional cores. See "CPU Saturation of VM" below.
- If all CPUs remain underutilized, either the application in the VM is misconfigured or the VM is waiting for IO operations to complete. See "Low CPU Utilization" below.
CPU Saturation of Host
As stated above, both the PCPU(%) and %USED counters can be used to identify systems hosts that are using all physical CPUs. It is possible, however, for the VMs on the system to be utilized nearly all of the processor cycles without actually requesting more that is available. This near-saturation case is the sign of a heavily loaded system.
A better sign of over-utilization on a host is ready time (%RDY). When any world's ready time starts to climb, that world is spending the reported percentage of its time waiting for some CPU to become available for work. Ready time above 10% is worth investigation and may be a sign of an over-utilized host. For a more detailed discussion on ready time, see Ready Time.
Host saturation is a clear sign that too much work has been loaded onto a single server. This is usually due to overly aggressive consolidation ratios. Overcommiting CPU resources in this case will only worsen the performance. Consider the following remedies:
- Verify that VMware Tools has been installed on every VM on the system. In addition to many other benefits, VMware Tools provides a network driver (vmxnet) without which guest networking will be unnecessarily inefficient.
- Verify that the all systems in the DRS cluster are carrying load when the server of interest is overloaded. If they aren't, increasing aggression of DRS algorithm and check VM reservations against other hosts in the cluster to ensure migrations will happen. Lastly, increase the number of servers in the DRS cluster so VMs from this server can be migrated to servers with available resources.
- Increase the CPU resources available to the VMs by increasing or improving CPUs or cores on some of the systems in the DRS cluster.
- Set CPU reservations for the VMs that most need the processing power to guarantee that they get the CPU cycles they need.
- Ensure the newest version of ESX Server is being used. The newer versions of ESX Server provide better efficiency and CPU-saving features such as TCP segmentation offload (TSO), large memory pages, and jumbo frames.
- Reduce the CPU resource footprint of running VMs. As examples:
- Decrease disk and or network activity for applications that cache data by increasing the amount of memory provided to the VM. This may lower IO and reduce ESX Server's responsibility to virtualize the hardware.
- Assist CPU by replacing software I/O with dedicated hardware (such as iSCSI HBAs or TCP segmentation offload NICs).
- Reduce vCPU count for guests to only the number required to execute the workload. For instance, a single-threaded application in a 4-way guest will only benefit from a single vCPU. But the hypervisor's maintenance of the three idle vCPUs takes CPU cycles that could be used for other work.
- For VMs created using P2V conversion, analyze the VM resources as well as the applications running inside the VM. Stop the unnecessary services that may running inside the P2V'ed VM. Also reduce the number of vCPUs and memory count to only the number required to execute the workload.
The easiest general comment for addressing CPU bottlenecks given correctly-configured VMs is to address processing power at the cluster level. If VirtualCenter reports fully utilized CPUs for all hosts in the cluster, there is little possibility of avoiding a need to increase cluster resources or decrease VM count.
One last nuance of virtual system tuning, mentioned in item 6c above, is the correct balancing of virtual CPU count. Few applications fully utilize two or more vCPUs and many VMs are often committed to a special purpose with a single application. The guest OS and the hypervisor must expend CPU cycles managing multiple vCPUs. If the applications are not using them, the system efficiency as a whole will improve by reducing vCPU count for VMs.
CPU Saturation of VM
Like host CPU saturation, VM CPU saturation can be seen when the %USED for a VM is high. Unlike host CPU saturation, the idle world may report a large amount of free computational resources and the VM's ready time (%RDY) may remain low. This behavior can be seen when a single VM utilizes all of the processors allocated to it but additional CPUs remain unused on the host. The VM's utilization of all of its vCPUs can be confirmed by expanding the VM's world on the CPU screen. Once this has been confirmed, the following options are available:
- Verify that VMware Tools has been installed on every VM on the system. In addition to many other benefits, VMware Tools provides a network driver (vmxnet) without which guest networking will be unnecessarily inefficient.
- If possible, increase the number of vCPUs provided to the VM. As the application in the guest is successfully using all of its vCPUs, it may continue to scale as the vCPU count is increased. Pay attention to the vmmX world for each vCPU after increasing vCPU count to verify that the VM is making use out of its newly provided resources. As detailed in item 6c in the "CPU Saturation of Host" section, the addition of vCPUs imposes an overhead on the host whether they are being used or not. So carefully assess the guest's needs to avoid unneeded vCPU count increases.
- If possible, you can power on multiple VMs running the same application. This will depend upon whether how well an application supports horizontal scalable configuration. It is possible that an application may perform better when running as multiple single vCPU Vms, rather a single SMP VM.
- Utilize faster processors. As processor performance is continually increasing the option of upgrading processors or migrating the VM to systems with newer processors can provide more total throughput to the VM.
- Set CPU reservations for the VMs that most need the processing power to guarantee that they get the CPU cycles they need.
- Decrease the work as a result of running the VM. As examples:
- Decrease disk and or network activity for applications that cache data by increasing the amount of memory provided to the VM. This may lower IO and reduce ESX Server's responsibility to virtualize the hardware.
- Assist CPU by replacing software I/O with dedicated hardware (such as iSCSI HBAs or TCP segmentation offload NICs).
Low CPU Utilization
Assuming performance problems have been confirmed, low CPU utilization is usually a sign of inefficiently designed datacenter architecture. The design could be flawed in an individual VM or in the connectivity between various components. The Performance Monitoring and Analysis will walk through investigation of system-level components such as memory and then system-wide components such as network and storage.
References
esxtop Performance Counters
Understanding VirtualCenter Performance Statistics