CPU Performance Analysis and Monitoring

Version 7

    Introduction

    CPU load is generated by the guest and its applications as well ESX  Server as it provides a virtual interface to the hardware.  While the  work performed by the host does result in some increase in load, the  great majority of processing is due to the applications in the VM.  A  solid understanding of the workload profile regardless of the virtual  environment can assist CPU analysis.

    Check Utilization

    Invoke esxtop.  By default, it should show CPU utilization but pressing  ‘c' will ensure this data is being displayed.  The following figure  shows example data produced on a test system.

    http://communities-prod-app-2.vmware.com:8080/docs/DOC-5420/esxtop-cpu-main.JPG

    Observe the following:

     

    • The PCPU(%) line in the header shows utilization for the  processor(s) by core and in total.  The comma-delimited data first  displayed shows core utilization followed by "used total" which averages  utilization of all cores.
    • The LCPU(%) line shows the percentage of CPU utilization per logical  CPU. The percentages for the logical CPUs belonging to a package add up  to 100 percent. This line appears only if hyperthreading is present and  enabled.
    • The CCPU(%) line shows the percentages of total CPU time as reported  by the ESX Server service console. Use of any third party software,  such as management agents and backup agents, inside the service console,  may result in high CCPU(%) number.
    • There is an idle world running whose %USED entry displays the amount  of CPU cycles that remain unused.  If the idle world is reported at  less than 100% utilization then only a fraction of one physical core  remains for additional work.  As this number can max out at many  hundreds of percentages (100% for each core) small numbers here  represent heavily loaded systems.
    • Check the utilization (%USED) of the interesting VMs.  The VMs are  reported here with the names specified at their time of creation.  Like  the idle row, utilization for each VM can exceed 100%.  A VM that was  provided two vCPUs, as an example, can max out at 200% CPU utilization.
    • Expand the group data for the VM that is most interesting.  This is  done by hitting ‘e' and then entering the group ID number (GID) for the  VM.  The figure below contains a CPU-expanded version for GID "30" in  the previous figure.  Once expanded, esxtop will expand rows and provide  counter data for every world in the group.  This includes: 
      • vmmX:  For each vCPU provided to the VM, a virtual machine monitor  (VMM) world is displayed.  This world will perform the majority of the  work required to execute and virtualize the guest code (OS, application,  and hypervisor).
      • vcpu-X: A vcpu-X world is created to assist the VMM world for each  vCPU.  Primarily this work revolves around the virtualization of the IO  devices.
      • mks: Mouse, keyboard, and screen interrupt servicing.
      • vmware-vmx:  The VMX worlds assist in maintenance and communications  with other worlds and should not represent a material portion of the  group utilization.


    http://communities-prod-app-2.vmware.com:8080/docs/DOC-5420/esxtop-cpu-main-expanded.JPG

    Evaluate the Data and Correct the System

    The general flow for evaluation starts by considering the system's load.   Is the system overloaded with too many VMs?  Is the guest using all of  its vCPUs and simply requires more or faster processors?  Are all  guests waiting for IO?  For example:

    1. Check the PCPU(%) line to see if all cores' utilization is near  100%.  In this case the system is saturated.  If multiple VMs are  competing for the CPUs, try to reduce the VMs on the system or find  other means of decreasing the load on the system.  See "CPU Saturation  of Host" below.
    2. See if the PCPU(%) line shows an unequal load across processor cores  with some at saturation and some remaining near idle.  This would  indicate applications within the VM utilizing all of the cores provided  to them.  Increase its vCPU count, if possible, and verify that the  guest is making use of the additional cores. If the application supports  horizontal scalability, you may run multiple VMs to use the additional  cores.  See "CPU Saturation of VM" below.
    3. If all CPUs remain underutilized, either the application in the VM  is misconfigured or the VM is waiting for IO operations to complete.   See "Low CPU Utilization" below.

     

    CPU Saturation of Host

    As stated above, both the PCPU(%) and %USED counters can be used to  identify systems hosts that are using all physical CPUs.  It is  possible, however, for the VMs on the system to be utilized nearly all  of the processor cycles without actually requesting more that is  available.  This near-saturation case is the sign of a heavily loaded  system.

    A better sign of over-utilization on a host is ready time (%RDY).  When  any world's ready time starts to climb, that world is spending the  reported percentage of its time waiting for some CPU to become available  for work.  Ready time above 10% is worth investigation and may be a  sign of an over-utilized host.  For a more detailed discussion on ready  time, see Ready Time.

    Host saturation is a clear sign that too much work has been loaded onto a  single server.  This is usually due to overly aggressive consolidation  ratios.  Overcommiting CPU resources in this case will only worsen the  performance. Consider the following remedies:

    1. Verify that VMware Tools has been installed on every VM on the  system.  In addition to many other benefits, VMware Tools provides a  network driver (vmxnet) without which guest networking will be  unnecessarily inefficient.
    2. Verify that the all systems in the DRS cluster are carrying load  when the server of interest is overloaded.  If they aren't, increasing  aggression of DRS algorithm and check VM reservations against other  hosts in the cluster to ensure migrations will happen.  Lastly, increase  the number of servers in the DRS cluster so VMs from this server can be  migrated to servers with available resources.
    3. Increase the CPU resources available to the VMs by increasing or  improving CPUs or cores on some of the systems in the DRS cluster.
    4. Set CPU reservations for the VMs that most need the processing power to guarantee that they get the CPU cycles they need.
    5. Ensure the newest version of ESX Server is being used.  The newer  versions of ESX Server provide better efficiency and CPU-saving features  such as TCP segmentation offload (TSO), large memory pages, and jumbo  frames.
    6. Reduce the CPU resource footprint of running VMs.  As examples:
      1. Decrease disk and or network activity for applications that cache  data by increasing the amount of memory provided to the VM.  This may  lower IO and reduce ESX Server's responsibility to virtualize the  hardware.
      2. Assist CPU by replacing software I/O with dedicated hardware (such as iSCSI HBAs or TCP segmentation offload NICs).
      3. Reduce vCPU count for guests to only the number required to execute  the workload.  For instance, a single-threaded application in a 4-way  guest will only benefit from a single vCPU.  But the hypervisor's  maintenance of the three idle vCPUs takes CPU cycles that could be used  for other work.
      4. For VMs created using P2V conversion, analyze the VM resources as  well as the applications running inside the VM. Stop the unnecessary  services that may running inside the P2V'ed VM. Also reduce the number  of vCPUs and memory count to only the number required to execute the  workload.


    The easiest general comment for addressing CPU bottlenecks given  correctly-configured VMs is to address processing power at the cluster  level.  If VirtualCenter reports fully utilized CPUs for all hosts in  the cluster, there is little possibility of avoiding a need to increase  cluster resources or decrease VM count.

    One last nuance of virtual system tuning, mentioned in item 6c above, is  the correct balancing of virtual CPU count.  Few applications fully  utilize two or more vCPUs and many VMs are often committed to a special  purpose with a single application.  The guest OS and the hypervisor must  expend CPU cycles managing multiple vCPUs.  If the applications are not  using them, the system efficiency as a whole will improve by reducing  vCPU count for VMs.

    CPU Saturation of VM

    Like host CPU saturation, VM CPU saturation can be seen when the %USED  for a VM is high.  Unlike host CPU saturation, the idle world may report  a large amount of free computational resources and the VM's ready time  (%RDY) may remain low.  This behavior can be seen when a single VM  utilizes all of the processors allocated to it but additional CPUs  remain unused on the host.  The VM's utilization of all of its vCPUs can  be confirmed by expanding the VM's world on the CPU screen.  Once this  has been confirmed, the following options are available:

    1. Verify that VMware Tools has been installed on every VM on the  system.  In addition to many other benefits, VMware Tools provides a  network driver (vmxnet) without which guest networking will be  unnecessarily inefficient.
    2. If possible, increase the number of vCPUs provided to the VM. As the  application in the guest is successfully using all of its vCPUs, it may  continue to scale as the vCPU count is increased.  Pay attention to the  vmmX world for each vCPU after increasing vCPU count to verify that the  VM is making use out of its newly provided resources.  As detailed in  item 6c in the "CPU Saturation of Host" section, the addition of vCPUs  imposes an overhead on the host whether they are being used or not.  So  carefully assess the guest's needs to avoid unneeded vCPU count  increases.
    3. If possible, you can power on multiple VMs running the same  application. This will depend upon whether how well an application  supports horizontal scalable configuration. It is possible that an  application may perform better when running as multiple single vCPU Vms,  rather a single SMP VM.
    4. Utilize faster processors.  As processor performance is continually  increasing the option of upgrading processors or migrating the VM to  systems with newer processors can provide more total throughput to the  VM.
    5. Set CPU reservations for the VMs that most need the processing power to guarantee that they get the CPU cycles they need.
    6. Decrease the work as a result of running the VM.  As examples:
      1. Decrease disk and or network activity for applications that cache  data by increasing the amount of memory provided to the VM.  This may  lower IO and reduce ESX Server's responsibility to virtualize the  hardware.
      2. Assist CPU by replacing software I/O with dedicated hardware (such as iSCSI HBAs or TCP segmentation offload NICs).

    Low CPU Utilization

    Assuming performance problems have been confirmed, low CPU utilization  is usually a sign of inefficiently designed datacenter architecture.   The design could be flawed in an individual VM or in the connectivity  between various components.  The Performance Monitoring and Analysis will walk through investigation of system-level components such as  memory and then system-wide components such as network and storage.

    References

    esxtop Performance Counters

    Understanding VirtualCenter Performance Statistics