VMware Cloud Community
mikitracey2015
Contributor
Contributor

Memory/CPU overcommitment recommendations

We are campaigning for more resources in our vSphere vCenter (VMWare 5.5).  Sadly, I don't have the level of access to it that allows me to see things like page size and so far in my reading i can't find anything beyond statements that this or that config 'affects" my allocations.  And since there are no recommendations for overcommitment even though there are notes that certain configs can affect my ability to support it it's pretty hard to get traction when we say "yo, we need more RAM/CPUs"

(my reference: VMware vSphere 5.1 I read through the entire section on Memory Virt. basics and Administering Resources however most of the settings described were specifically denied in the vSphere Client)

I believe i have 144G of Memory and 72 vCPUs (2x6 actual) yet i'm chugging along with some serious overcommitments: the vms are a little cranky at the moment about their RAM and i'm overcommitted by 210%  I am getting memory warnings at the moment, but they tend to resolve by bringing down a vm with as little as 1G RAM. In my experience, however, even a warning can cause failures of the underlying vms.

I'm not currently overcommitted for CPUs, but upon quick review of the cpu section i don't see anything about overcommitment.  HAs anybody written an article on such a thing.  I couldn't face reading any more sections, but will perhaps tackle that tomorrow.

I was sad to find that unlike with disk space which i tackled several years ago, there were no suggestions or recommendations.  So, my question is.  For the most common configurations of VMWare, what is the recommended maximum overcommitment of CPU and Memory. I don't mind reading blogs, but i need to be able to filter it down to: config a: 110% 100%, config b:90% 90%. 

5 Replies
homerzzz
Hot Shot
Hot Shot

I do not think you will find a general recommendation for over commitment since it is based on the workloads in your environment. You need to monitor the performance metrics in order to determine how far you can over commit before VM performance is impacted. If you do not have access to the metrics or if the performance of the environment is not being monitored, I recommend no over commitment. In most environments you need data to back up resource needs to management. What does the overall CPU utilization look like? My environment is over committed 4 vCPU to 1 physical core, but this is OK since utilization and contention is low. Do you have constant memory swapping and ballooning? Without looking at the metrics, I do not see a recommendation being possible.

0 Kudos
hussainbte
Expert
Expert

I am not sure if you have gone through the linked performance best practices guide.

However I don't think it will provide exact figures or percentage of over commitment that can be done.(I haven't read the whole guide) in CPU and Memory.

However it explains very important stats that will help you determine how much you can over commit.

https://www.vmware.com/files/pdf/techpaper/VMware-PerfBest-Practices-vSphere6-0.pdf 

If you found my answers useful please consider marking them as Correct OR Helpful Regards, Hussain https://virtualcubes.wordpress.com/
mikitracey2015
Contributor
Contributor

Just to verify, this guide is for vSphere 6. Is there anything different between 5.5 and 6?

0 Kudos
vLarus
Enthusiast
Enthusiast

First thing first, do you have a single host running 144 GB RAM and 2socketx6core CPU?

I wouldn't sell this as a resource contention problem, its more of an availability issue since all the workloads will be down if you lose that host.

But if its a non-critical host running stuff that can go down for an extended period of time then overcommitment ratios are based on the workloads running on the hosts them selves.

So you have now 12 physical sockets with 144 GB memory. That can also be stated as two NUMA nodes with 1x6cores and 72 GB memory.

According to "unofficial" recommendations that are based on "best practices" and should be considered only as recommendations and do not apply everywhere, is to not go over 1 physical core : 6vCPU ratio cause in the olden vSphere 4 days this was some magical number when the CPU scheduler on the ESXi hosts would have problems with assign timeslots to each of the vCPU, and not helping when VMs had multiple vCPUs.

When you start to see co-stop increase, and ready time increase you are running close to your specific maximum ratio. This can be mitigated by right sizing your VMs by only giving the the vCPU they need (resource based, not thread based).

Eric Sloof has great slide deck on this and still applies even it is based on vSphere 4: http://www.ntpro.nl/blog/uploads/AdvancedTroubleshooting.pdf

As for memory overcommitment, just try to rightsize them based on the peak Active Memory used by each VM. The allocation is never a good measurement of memory usage since OSs tend to be different on how they hold on to and use memory (like in boot where all the pages are touched on some OSs, while others do not).

Hope this helps

Larus.

vmice.net
hussainbte
Expert
Expert

There has to some enhancements in version 6 regarding CPU scheduling and memory management, there were major enhancement from VMware when they moved from 4.x to 5.x.

having said that the key stats mentioned by vLarus hold the same importance.

The threshold from VMware regarding %RDY seems to be 5. The lower we keep it, the better it is. but we cannot obviously do a 1:1 pCPU to vCPU mapping and not take advantage of virtualization and over-commitment.

If you found my answers useful please consider marking them as Correct OR Helpful Regards, Hussain https://virtualcubes.wordpress.com/
0 Kudos