Re: CPU % ready question

scale21 · ‎11-09-2017

I am trying to right-size a vm.

i have one vm running 8vcpus in my host.

My host has other workloads and has dual 12 core cpus.

For this VM, ESXTOP shows a value of 5 - 8 for %rdy which is above what you want to see.

Esxtop refreshes every 5 seconds

vcenter performace summation refreshes every 20 seconds but reads about 1500. This is about the same as a esxtop reading of 3.

There is No costop happening at all.

Great.

That all makes sense.

IF the %rdy number was higher say.....above 5 which is a threshold KPI, but there was no costop happening would it still affect performance? I guess i am trying to sort out when i have too many vcpus allocated to a vm. Clearly you want that number below 5 and definitely below 10 but if costop is still at 0, is there a problem? Clearly the vcpus are still having to wait to be scheduled, hence the number being high. I know costop is a bad thing for sure especially once that number gets above 3. I know %rdy is bad once it gets above 5 and 10 is a horrible number to see.

In my test then, my %rdy is showing 5-8% with an average value of 1500 summation. I dont see any performance issues. SHould i?

Also....

I have been trying to use vmcalc.com to calculate some realword numbers but i think it is calculating incorrectly.

For example if you put in my numbers and vcpu numbers

realtime

1500

8 vcpus

it calculates the %rdy as .94??

This is a long ways from the 8 i am showing live in esxtop.

That is confusing. Perhaps this calculator isnt accurate or no longer working properly.

If i change the vcpu value to 1, then it shows my %rdy as 7.5 which is correct. I dont have 1 vcpu though.....i have 8. Either i am not reading this calculator right or it is wrong. At any rate, it is an external calculator on a website and not my actually numbers shown in my environment so i will assume it is just wrong unless i am reading it wrong.

I have since dialed my vcpus back to 6 in my vm which brought my %rdy down to about 3 in this vm or 600 summation. This seems to be the better choice as to the "right size" for this vm.

scale21 · ‎11-09-2017

ah...perhaps the rule is a %rdy in esxtop is per vCPU. If a vm has 4vCPU than my esxtop number to watch for would be 20 %rdy or if the vm is expanded, make sure any one of four vcpus is below 5 at all times to be healthy?

Finikiez · ‎11-09-2017

esxtop shows %RDY for a whole VM as a summary of all vcpus. So to check %RDY for a particular vcpu you need to expand a VM by pressing 'e' and typing GID of a VM.

So if you have VM with 8 vCPUs and esxtop show 5% for a VM, you need to divide 5% by 8 and you will get the average number for a single vCPU. And it's a very low value.

So VM ready time is ok in your case.

scale21 · ‎11-09-2017

That makes sense. In your example if a vm has 8 vcpus and i saw 40 as a esxtop %rdy metric, id have a problem or i should start to worry......as each vcpu (when expanded) show ~5 and that 5 and anything above it could be a problem.

Finikiez · ‎11-09-2017

Well it depends. Even a VM has high ready time it won't mean that this VM has a problem. It depends on workload type and other factors.

However generally speaking 10% is considered as a threshold when you can start checking what's going on.

raidzero · ‎11-09-2017

Agree with other poster about summation values. You can see this easily by looking at realtime chart advanced for CPU, using chart properties to only select CPU Ready, then exporting to CSV. You'll see that the total VM CPU ready is a sum of all the CPUs on the box. Additionally, this will explain why you see reduced CPU ready when removing processors.

Further info in case you are interested, from my experience.

High %Ready means that the VM was wanting to execute on a slot, but no slots are free. Generally this host is too busy or there are CPU limits in place.

High %Costop means that the VM is wanting to execute, slots are free and one or more vCPUs are in line to use it, but not enough free slots to satisfy all the processors in the VM. While this can be caused by host too busy, I see this more in environments with too many multiprocessor VMs running on the same host. Although host too busy can cause costop issues as well.

So if you have a ton of 1 vCPU VMs bursting at the same time, you can have high CPU ready but will have 0% costop. This is still indicating a performance problem (host is overloaded).

Usually hosts have a little of both kinds of VMs, small and large from a vCPU count. Your hosts have 24 cores, 48 slots with HT. I would be surprised to see costop from an 8 proc VM unless you have a lot of other high proc VMs on this box. It is hard to know without taking a larger look at the environment.

All that said, I think you might be overthinking this a little. You don't really right size individual VM CPU using CPU ready, you right size it from the processes running on the guest. You say "you removed 2 processors," but what is the utilization in the guest? Generally you start small, and grow if the guest hits 100%.

CPU ready tends to be more of a metric for host loading or a measure of the VMs as a whole, not individual guest sizing. CPU for a VM is "right sized" when it is just large enough to support the workload (or sized to support workload scale). A VM with 8 processors that is using 1% of them is not right sized. And a VM with 2 processors that is using 100% of them all the time is also not right sized. So if you look at your VM with 6 processors and it is using 20% CPU, it probably needs to come down more irrespective of the CPU ready metric.

Hope that helps.

scale21 · ‎11-09-2017

It helps very much. Thank you.

WE are seeing just that. The vm cpu usage is low per the performance charts in vmware and in the guest os, it also appears quite low. I agree here. We could probably go from 6 down to 4 and add more users per server. It seems like we might have some resource waste.

WE are set to at least quadruple our capacity from 100 users across 4 servers to 400 users across X number of servers. WE are trying to solve for X without destroying performance.

The goal becomes we want to manage the fewest number of vms as possible. Each server now handles ~25 users just fine all day long. We could just go to 16 vms with the same number of users but we dont want that management overhead. We are going to try and increase resources to lower our overall number of vms for management and licensing costs. I know their are arguments on both sides....such as you dont want all your eggs in one basket in the event of a vm failure etc. I am just trying to gauge our options. Do we go 16 servers with 25 users each (wide and low) or do we ramp up resources and shoot for 10 servers at 40 users each etc. Looking at these numbers and testing is really helping me figure this out.

Thank you for the replies. I have been looking and overthinking myself in circles so the feedback does help.