I am troubleshooting some VMs that appear to be having problems. They are 4vCPU terminal servers and when I watch esxtop the CPU-ready reaches 40%+ quite often, but for fairly short periods of time. As these servers are time critical I presume that the user complaints of slow office apps and freezing may be down to this. I've only monitored it visually with esxtop, the vCenter rollups are not granular enough to show the problem.
I have inherited a vSphere farm that has a lot of memory, but is very much overcommitted for CPU with many high vCPU guests.
My question is what effect does high CPU ready have on the performance logs of the host and guest?
What effect on host CPU utilisation as viewed by vCenter?
What effect on guest CPU utilisation as viewed by vCenter?
What effect on guest CPU utilisation as shown by the perf logs in Windows?
If the CPU is unable to do work because the VM cannot be scheduled on it, does it show low utilisation even though in effect some cores are locked and unavailable for work? Or to put it another way Windows is asking the CPUs to do work, but the CPUs are not able to respond which is similar to if they were running at 100%. But how does Windows see this.
I'm trying to clarify how CPU ready effects stats as I will be asked by management to explain this and they are used to CPU utilisation graphs, but this will take a bit more explanation. They would have no problem with change requests to try to reduce load if they saw high CPU utilisation. Explaining CPU ready and providing stats is a bit more vague.
The stats are available in vCenter, and you can increase the statistics level if you need a more granular logging over time. Overcommitting vCPU's has a negative impact on performance of both the individual VM and others on the same host (especially, with VM's that have a higher number of cores). I see this in my own infrastructure and have done some extensive testing to prove this to provide evidence to business / application owners. Interestingly, I see a higher ready time on VM's that are under a low CPU load than busier ones - I assume this means that a lower priority is given to CPU cycles that are not in great demand.
You can easily show the impact by monitoring ready time of a VM on a overcommitted host (use the real time view in the overview performance tab) ... you should see the cpu usage, and ready time on the same chart. Then simply migrate SMP VM's onto other hosts to see the impact of co-scheduling and ready time. Alternatively, do a similar test on a VM with say 4x vCPU's, reduce this to 2x vCPU's on the same overcommitted host and see the ready time drop off.
Remember that when SMP VM's need to schedule cycles, it needs to schedule all concurrently (regardless of the workload). I've put together a few scripts to help with this and can send them over to you, but let me know more about your environment so that I can send the relevant one.
Yes, i have read that it is possible for there to be low utilization with a high ready value because . But, it is important to remember that %rdy deals with contention for CPU time not the amount of work that needs to be done. Plus the cpu scheduler will find it more easy to schedule VMs with a lower number of vCPUs. Also you can use shares and reservations to help prioritize work loads. In terms of vCPUs, it is a best practice to start out with 1 vCPU for most configurations.
Thanks for the replies.
I understand about %rdy and best practice to reduce it.
I am trying to clarify what happens to %used/CPU utilisation both on the host and the guest when %rdy is high. Does a high CPU ready reduce CPU utilisation on the host giving the false impression that there are plenty of resources. The same goes for the guest.
I am very interested to know how Windows interprets CPU utilisation when CPU ready is high. Does windows calculate CPU utilisation on the processor maximum MHz for that CPU and then work out load based on MHz that it is actual given? If a server is working flat out but with high %rdy, instead of showing 100% CPU does it only show 60% CPU utilisation when %rdy is about 40%? i.e. does the %rdy mask the high CPU utilisation giving the impression that it has spare capacity?