VMware Cloud Community
jbiggley
Contributor
Contributor

Reporting on memory overcommittment - tools and theory?

I've spent some time trying to determine the proper memory overcommittment levels for our clusters.  Of all of the values available on the Summary and Resource Allocation tabs (vSphere 4) I've been focusing on the following values:

Allocated - assigned RAM to the virtual machine

Overhead - RAM needed to support this VM

(I've calculated a total RAM field in my spreadsheet that is Allocated + Overhead)

Private - physical RAM on the host allocated to the VM (1 to 1) (This is a subset of the Allocated RAM)

Shared - amount of RAM that is shared (TPS) and common across multiple VMs (Also a subset of Allocated RAM)

(Private + Shared should equal Allocated)

Based on that I have a couple of questions.

1.  Is it safe to assume that I can continue to overcommit RAM as long as the Private RAM is less than the Worst Case Allocation?

2.  If trying to figure out a % commit level (eg. if I have 500GB RAM, how much RAM can I commit to the VMs) what calculation can I perform?

3.  Is there a platform out there that provides these statistics historically?  (I'm digging around in SolarWinds Orion platform to see if I can collect stats from the Resource Allocation tab)

Any experience doing this, or anything else, to help manage the overcommitt levels of a cluster?

Thanks,

Josh

Reply
0 Kudos
4 Replies
jbiggley
Contributor
Contributor

As an update to this, I read a paper yesterday from Kingston about memory allocation entitled "VMware HA and DRS Capacity Planning'.  In that paper they stated that TPS was 'reset' every time a VM was migrated between hosts.  The assertion being that while TPS could be leveraged, ESX hosts still needed to be scaled to allow all assigned memory (that is, the amount of RAM in the VM settings) to be available in the event of a host failure.

1)  Is that a true statement regarding TPS?  Does TPS 'reset' when moving to a new ESX host in the same cluster? (vSphere 4.x)

2)  What is the standard over-commit percentage used in industry?  I've calculated my over-commit at nearly 47%, but that seems excessively high if the paper from Kingston is to be taken at face value.

Thanks for the feedback.

Reply
0 Kudos
depping
Leadership
Leadership

1) yes when a VM is vMotioned on the destination host TPS will need to scan these VMs memory pages again and collapse them which could take up to 60 minutes. However if there is memory pressure the VMkernel will request TPS to start scanning more aggressively and collapse to avoid other memory reclamation techniques

2) I haven't seen an average to be honest, but I would say that 50% is not bad at all. You can look at your current savings and see what TPS results in today, overprovisioning is not a problem as long as your not swapping.

Duncan (VCDX)

Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive

jbiggley
Contributor
Contributor

Duncan:

If running a cluster with TPS around 50%, what impact does that have on the over-commit levels in a cluster?  Is there a specific metric for over-committing RAM?  I would think that you wouldn't want to fully commit all of the RAM saved by TPS to other VMs.

Any tools to help measure/monitor TPS and over-commit levels in real-time and/or historically?

Last question -- when TPS is being re-calculated because of a host failure, does that impact the TPS % of the VMs on the host already or only that of incoming guest VMs?

Thanks for the feedback,

Josh

PS:  I was in the design course last week.  Our instructor suggested (strongly) your new book to the class, as well as Yellow-Bricks.com.  You've got quite the fan-base among the design instructors it seems!  When can we buy your book in Canada though?

Reply
0 Kudos
VMmatty
Virtuoso
Virtuoso

In the event of a host failure and VMs restarting on other host(s), all VMs on that host could potentially be affected.  If there isn't enough RAM available on the host before TPS kicks in then you may get ballooning/swapping that could affect performance of all VMs.  As TPS kicks in the amount of swapping will likely (hopefully) go down.

It shouldn't affect the amount of RAM shared on guests already running.  It's possible that the amount of memory shared per VM could go up if the VMs being restarted on the host have a lot of memory in common (as compared to the VMs that were already on the host).

In my personal experience seeing 50% memory savings from TPS is high and a little on the unusual (but not bad) side.  The only instances where I've consistently seen numbers like that are VDI environments where all VMs are clones of the same master template (either VMware View or Citrix XenDesktop).  For server workloads I've seen the 25-30% range much more frequently.

Matt | http://www.thelowercasew.com | @mattliebowitz
Reply
0 Kudos