Ready Time

Version 5

    Introduction

    Ready time is as important as it is confusing.  I'm going to collect a  few thoughts on ready time in this collection point with the hopes that  some of the confusion around this important part of virtual system  performance can be eliminated.

    Details

    Stated simply, ready time is the amount of time a VM wants to run but  has not be provided CPU resources on which to execute.  Somewhat  confusingly, ready time is reported in two different values between  esxtop and VirtualCenter.  In esxtop is reported in an easily-consumed  percentage format.  A number of 5% means the VM spent 5% of its last  sample period waiting for available CPU resources.  In VirtualCenter  ready time is reported as a time measurement.  In VC's real-time data,  which produces sample values every 20,000 ms, a number of 1,000 ms is  reported for a 5% ready time.

    There is so much more to know about ready time that I'm not going to reproduce here.  Read the whitepaper on the subject for more details.  There have been no changes in the details on ready time since ESX 3.0 that make that paper out-of-date.

    Interpreting Ready Time Values

    The most common question we get on ready time is, "what ready time  numbers constitute a problem?"  While there is no easy answer to this,  we can offer some guidance on the acceptable values.  But before I lay  that out, let me say that ready time should not be the ultimate  measurement of system performance.  As always, user experience and  latency should be.  There are some situations where user experience is  horrible on a system with no load and virtually zero ready time.  This  could happen with a mis-configured array, as an example.  And  occasionally we see aggressively-consolidated hosts showing very high  ready times that are meeting user needs.  There are no absolutes with  ready time.

    But, there are a few general regions into which ready time values can be  binned.  Note that these ready time values are per vCPU.  esxtop  reports ready time for a VM once its been summed up across all vCPUs.   That means that 5% ready on each of four vCPUs will be reported as 20%  ready at the VM level.  This is the high end of a very light amount of  ready time.

     

    Value, per vCPUDescription
    r == 0%This doesn't happen.  The very presence of a hypervisor between the  operating system and the hardware means that there is a non-zero ready  time on all operations.  But on healthy systems this number is so small  that end-users don't know their workload has been virtualized.  See the  next section.
    0% < r <= 5%This is the "normal" region for ready time.  Very small single digit  numbers result in a minimal impact to user experience.  If performance  problems exist on the system and ready time falls into this region, your  problems lie elsewhere.
    5% < r <= 10%In this region ready time is starting to be worth watching.  Most  systems function healthily with ready time in this region but highly  sensitive measurements may be suffering.
    10% < rWhile some systems continue to meet expectations, double-digit ready  time percentages often mean some action is required to address  performance issues.  See the last section for guidance.



    Again, remember that VirtualCenter performance numbers must be  re-calculated to percentages to find the category on the above table.   But since VC reports ready time per vCPU, no special arithmetic is  needed to account for the number of vCPUs in the VM (as is needed with  esxtop.)

    Causes and Correction

    There are two general areas that can cause unnecessarily high ready times:

    1. Overloaded hosts.
    2. Excessive use of SMP.

    Host Overloading

    The most common cause of high ready time is trying to get too much work  out of too little hardware.  Consider the following simple case: on a  hypothetical system with only one physical CPU, if two 1-way VMs are  fully loaded by their users then each wants to have an entire CPU.   Because only one is available, ESX will time share that resource and  give each of them only 50% of the CPU.  As a result, each VM will spend  50% of its time waiting for the processor.  This would be reported as  50% ready time.

    Often this condition is observable when ready time is high and total  host CPU utilization is also very high.  The only fix for this is to  back off the load on the system.  VMs should be migrated off or  processor resources should be increased.

    Excessive SMP

    In ESX Server 2.5, SMP guests had to be co-scheduled to start at  the exact same moment.  If a 2-way VM was ready to run but only one  physical core was available, the VM would not be scheduled until a  second core was freed up.  This would increase its ready time.  In ESX  Server 3.0 and later versions, relaxed co-scheduling was introduced  which meant that a subset of a VM's vCPUs could be scheduled ahead of  others.  However, guest operating systems still require some degree of  co-scheduling which means that the relaxation isn't absolute.  In short,  increasing vCPUs still puts some burden on the scheduler to try and  co-schedule the vCPUs that can increase ready time.  This is one ready  why VMware advises only allocating vCPUs to VMs that are using them.   Read Co-scheduling SMP VMs in VMware ESX Server for more information on co-scheduling.

    This condition is manifested by hosts that have sub-optimal CPU  utilization and lots of SMP VMs.  A host may have a dozen 4-way VMs with  each showing high ready time but only be at an aggregate 40% CPU  utilization.  This is a clear sign that the scheduler is spending a  great deal of time managing unneeded vCPUs.