Solved: Re: Capacity estimates: CPU metrics

vmproteau · ‎03-02-2011

We have always adhered to an N+1 (90%) philosophy when designing and supporting our ESX Clusters. With the size of the servers we were using, and the average allocation per/VM, we hit memory ceilings well before we would ever have to consider CPU. Occasionally I would take a look at average VM\Core or vCPU\Core for the cluster but, only as a curiousity.

We are now looking a larger Hosts (256GB or more) where CPU contention will become more of a concern. In the past, I really only see anecdotal best practice numbers for VM\core or vCPU\core but this ratio will fluctuate depending on work load, VM resource allocation, hardware, etc.

I'm trying to identify specific CPU metrics to monitor in an ESX environment to help determine when a Host's are processors over allocated.

Ideally I'd like to be able to use N+1 with CPU as well but, that seems like a difficult calculus. At a minimum, I'd like to alert on specific CPU metrics and their associated normal/danger thresholds.

Anyone have opinion or aware of documentation related to this?

Peter_Grant · ‎03-06-2011

I've never been a fan of using VMs / Core. 2 VMs can be completely different hence this is a very rough calc.

I'd suggest profiling the workloads that you want to run on the cluster using something like VMware Capacity Planner, Platespin Recon etc to measure the max cumulative peak MHzs.

You’re not looking for avg CPU or the sum of the peak values for each worked (as they may peak at different times) but the cumulative highest value seen. Then use this as your upper limit plus say 10% as you're requirement. Then look at your hardware and work out how much CPH MHz is available then size on that. Much better than x vms/core. Still allow for N+1

As for values to measure I’d look at CPU Ready value on the VMs. This shows how long VMs have to wait to get CPU time and indicates CPU contention.

Also if you’re using hyper threading on an Intel server then I’d considering increasing the following values, especially if you’re getting close to max and running a lot of VMs.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=102023...

I’d look to increase the HaltingIdleMsecPenaltyMax To 8000 and HaltingIdleMsecPenalty to 2000

Both are advanced settings on the host.

Pete

------------------------------------------------------------------------------------------------------------------- Peter Grant CTO Xtravirt.com

View solution in original post

Peter_Grant · ‎03-06-2011