Ready time is as important as it is confusing. I'm going to collect a few thoughts on ready time in this collection point with the hopes that some of the confusion around this important part of virtual system performance can be eliminated.
Stated simply, ready time is the amount of time a VM wants to run but has not be provided CPU resources on which to execute. Somewhat confusingly, ready time is reported in two different values between esxtop and VirtualCenter. In esxtop is reported in an easily-consumed percentage format. A number of 5% means the VM spent 5% of its last sample period waiting for available CPU resources. In VirtualCenter ready time is reported as a time measurement. In VC's real-time data, which produces sample values every 20,000 ms, a number of 1,000 ms is reported for a 5% ready time.
There is so much more to know about ready time that I'm not going to reproduce here. Read the whitepaper on the subject for more details. There have been no changes in the details on ready time since ESX 3.0 that make that paper out-of-date.
Interpreting Ready Time Values
The most common question we get on ready time is, "what ready time numbers constitute a problem?" While there is no easy answer to this, we can offer some guidance on the acceptable values. But before I lay that out, let me say that ready time should not be the ultimate measurement of system performance. As always, user experience and latency should be. There are some situations where user experience is horrible on a system with no load and virtually zero ready time. This could happen with a mis-configured array, as an example. And occasionally we see aggressively-consolidated hosts showing very high ready times that are meeting user needs. There are no absolutes with ready time.
But, there are a few general regions into which ready time values can be binned. Note that these ready time values are per vCPU. esxtop reports ready time for a VM once its been summed up across all vCPUs. That means that 5% ready on each of four vCPUs will be reported as 20% ready at the VM level. This is the high end of a very light amount of ready time.
Value, per vCPU
r == 0%
This doesn't happen. The very presence of a hypervisor between the operating system and the hardware means that there is a non-zero ready time on all operations. But on healthy systems this number is so small that end-users don't know their workload has been virtualized. See the next section.
0% < r <= 5%
This is the "normal" region for ready time. Very small single digit numbers result in a minimal impact to user experience. If performance problems exist on the system and ready time falls into this region, your problems lie elsewhere.
5% < r <= 10%
In this region ready time is starting to be worth watching. Most systems function healthily with ready time in this region but highly sensitive measurements may be suffering.
10% < r
While some systems continue to meet expectations, double-digit ready time percentages often mean some action is required to address performance issues. See the last section for guidance.
Again, remember that VirtualCenter performance numbers must be re-calculated to percentages to find the category on the above table. But since VC reports ready time per vCPU, no special arithmetic is needed to account for the number of vCPUs in the VM (as is needed with esxtop.)
Causes and Correction
There are two general areas that can cause unnecessarily high ready times:
Excessive use of SMP.
The most common cause of high ready time is trying to get too much work out of too little hardware. Consider the following simple case: on a hypothetical system with only one physical CPU, if two 1-way VMs are fully loaded by their users then each wants to have an entire CPU. Because only one is available, ESX will time share that resource and give each of them only 50% of the CPU. As a result, each VM will spend 50% of its time waiting for the processor. This would be reported as 50% ready time.
Often this condition is observable when ready time is high and total host CPU utilization is also very high. The only fix for this is to back off the load on the system. VMs should be migrated off or processor resources should be increased.
In ESX Server 2.5, SMP guests had to be co-scheduled to start at the exact same moment. If a 2-way VM was ready to run but only one physical core was available, the VM would not be scheduled until a second core was freed up. This would increase its ready time. In ESX Server 3.0 and later versions, relaxed co-scheduling was introduced which meant that a subset of a VM's vCPUs could be scheduled ahead of others. However, guest operating systems still require some degree of co-scheduling which means that the relaxation isn't absolute. In short, increasing vCPUs still puts some burden on the scheduler to try and co-schedule the vCPUs that can increase ready time. This is one ready why VMware advises only allocating vCPUs to VMs that are using them. Read Co-scheduling SMP VMs in VMware ESX Server for more information on co-scheduling.
This condition is manifested by hosts that have sub-optimal CPU utilization and lots of SMP VMs. A host may have a dozen 4-way VMs with each showing high ready time but only be at an aggregate 40% CPU utilization. This is a clear sign that the scheduler is spending a great deal of time managing unneeded vCPUs.