Hi All,
I'm interested in the cpu %RDY figures I'm getting from a VM. It's sitting between 15% and 25%, mostly towards the lower end. I'm finding it hard to calculate whether that is good or bad. It's an 8 core machine so am I right in saying that's 3% or less on each vcpu. I've read that around 2.5 is nothing to worry about but to contradict that I have a chart saying it should be under 10%!
People are asking me to throw more cores at it but I really want to find a better way of increasing the performance. It's running video encoding so it's CPU intensive. Windows is reporting 75% cpu and vshpere performance is showing around 50% cpu usage. I think we're likely just to make the problem worse by giving it more cores.
The host it's on is only just over a 1:2 pCPU to vCPU ratio, we have a fairly high amount of large vm's with regards to cpu and memory.
Options are....
Increase cpu cores
Reduce cpu cores
Use resource shares?!?
Distribute & group larger smaller vm's separately
Thanks
As mentioned before ready time is the sum for all CPU's.
So if you have 40% ready time on a 10 vCPU VM, this means that each vCPU had to wait around 4% of the time of the interval to be scheduled on the processor. The 5% and 10% rules apply as PER vCPU and not the total amount. So it's important to not misinterpret the values you are looking to.
Interval is also important
An excellent post I use as reference a lot of time can be found here
http://vmtoday.com/2013/01/cpu-ready-revisted-quick-reference-charts/
What kind of hardware do you have ? Intel-based with HyperThreading or AMD?
I did some testing with VM's using 100% CPU on HT Intel machines, what I noticed, as soon as I passed the physical 16 core boundary (ESX host had 16 pCore / 32 logCore with HT) the ready times went up and became CPU bottlenecks the more VM's I put on it. This only as applicable if you all want to use the CPU cores at the same time . In our normal environment we have a ratio sometimes to 1:4 without high ready times because most machines don't use so much CPU on daily bases.
Try to map the logical cores/physical cores with the amount of vCPU's.
Do you use any affinity rules ? If set wrong this can negatively or positively impact your ready times.
That's a lot of ready time your processors are waiting for instruction. I've always started low on processor count unless I'm proven absolutely wrong. Sometimes less is more when it comes to procs.
Reduce your CPU count and give it high resource shares and see if that helps. You are right in thinking that upping the cores will not help performance. I can understand how video encoding would be intensive but that ready count throws red flags.
You are right, %RDY is cummilative of all vCPU assigned, 3% per vCPU. While in esxtop press e and type in the groupid of the vm, this will expand the vm world ids and you will be able to see exactly the %rdy time for each vCPU. video encoding is always resource intensive, your best option is to scale out. Before you make any further changes, I would suggest to do proper resource analysis for this vm during high usage period, monitor both hypervisor level and inside the guest vm, try to stay within numa node.
Scaling out isn't really an option as we need this one machine to deal with all the feeds. I definitely think we have an issue with oversized vm's. Problem is usually throwing some cores at it fixes the blipping but I think it's just masking the performance issue really. I'm sure we could achieve the same result with less cores. If I view the groupid of the VM I can see %rdy not equally spread across all the cores.
I've posted two esxtop readings, one for the group of the problem vm and one for all of the vm's. This host has 4 x 8 to 10 vCPU vm's, no surprise that 3 of them have the highest %rdy. The one showing 40%RDY usually sits around 25-30.
I'm going to migrate the problem vm to another host with a lower cpu usage first and monitor the %rdy, then I'll investigate reducing the cores and maybe using shares?
Have a look at this article. Very well written. I use this setting in my Lync environment and has worked out well for me. http://www.datacenterdan.com/blog/vsphere-55-bpperformance09-latency-sensitive-apps - I've never had to go above 4 cores per vm and there is about 14 of them just for Lync. 500+ users..must be on 5.5 however.
Looking at your esxtop output, your ESXi host is over committed, you are maxed out. esxi hosts is well over 100% cpu utilisation, 115% of host cpu based on esxtop for the last 15min.
You don't have enough CPU resource available on this particular host to drive the workload, possibly move to another will help. Also, I would check CPU utilisation across the cluster. If your workload is cpu intensive, then be very careful with over commitment ratios. ideally 1:1 for best performance or 2:1 for reasonable performance.
As mentioned before ready time is the sum for all CPU's.
So if you have 40% ready time on a 10 vCPU VM, this means that each vCPU had to wait around 4% of the time of the interval to be scheduled on the processor. The 5% and 10% rules apply as PER vCPU and not the total amount. So it's important to not misinterpret the values you are looking to.
Interval is also important
An excellent post I use as reference a lot of time can be found here
http://vmtoday.com/2013/01/cpu-ready-revisted-quick-reference-charts/
What kind of hardware do you have ? Intel-based with HyperThreading or AMD?
I did some testing with VM's using 100% CPU on HT Intel machines, what I noticed, as soon as I passed the physical 16 core boundary (ESX host had 16 pCore / 32 logCore with HT) the ready times went up and became CPU bottlenecks the more VM's I put on it. This only as applicable if you all want to use the CPU cores at the same time . In our normal environment we have a ratio sometimes to 1:4 without high ready times because most machines don't use so much CPU on daily bases.
Try to map the logical cores/physical cores with the amount of vCPU's.
Do you use any affinity rules ? If set wrong this can negatively or positively impact your ready times.
I didn't even look at those logs.. yeah, your hosts are pegged.
I'm new to esxtop, where are you seeing the 115% of host cpu? That's not what I'm seeing in vcenter performance, none of the hosts are that high.
We use Intel HT, 2 x 6 core, 24 logical.
What I've seen is even 2.5 %rdy is not good for machines that need low latency. Is it worth keeping larger machines machines together on a certain host and separating them from the smaller 1 and 2 core machines. Or is is the pcpu and vpcu ratio all that matters?
It's about scheduling; you only have 2x6 core which can execute CPU instructions on. So if it really is low-latency and needs the CPU performance
asvfk mentioned you would have the best performance keeping the ratio 1:1 (physical cores).
A good explanation can also be found here :
Hyper-Threading Gotcha with Virtual Machine vCPU Sizing | Wahl Network
Could you shed some light on why on the performance tab in vCenter for that host showed at max 75% avg usage but on the esxtop figure it was 1.15?
I keep this pinned above my desk. Invaluable overview of esxtop : http://www.running-system.com/wp-content/uploads/2012/08/esxtop_english_v11.pdf
Regarding vCenter perf metrics vs.ESXTOP output - Frank Pedersen has some good stuff here : http://www.vfrank.org/2011/01/31/cpu-ready-1000-ms-equals-5/
There is also a fling called visualesxtop that I use. Find it here: https://labs.vmware.com/flings/visualesxtop
Good Luck! Let us know how it goes for you!