Solved: Re: vSphere 5.5 HA Percentage Based Admission Cont...

Vinsane · ‎01-29-2015

I have been wanting to get to the bottom of for quite some time, and it's best described by a scenario.

1. I have a 2 host cluster, to make it easy, each host is configured with 32GB physical memory

2. The cluster is configured with Admission Control Enabled, and The Policy is set to Percentage of cluster resources 50% CPU / 50% Memory with the idea of reserving half of the capacity to support a 1 server failure.

3. For arguments sake, let's say there are 10 servers in the cluster, NONE of these servers are using any type of reservation for CPU / MEM.

4. Each of these 10 servers is configured for 6GB of memory (6GB x 10 VMS = 60GB allocated memory). For sake of argument lets say DRS distributes 5 VMs per host, 30GB per host.

5. Now, in my Cluster view, each of my host show a % Memory of 93% utilized, I have alarms going off in vCenter screaming that memory is low.

6. Since none of these VMs are running reservations (cutting in the the CPU and Memory) fail over capacity percentage. These numbers are barely showing a dent due to the default reservations being applied, (what like 32Mhz?) The readings in the cluster for Current CPU Failover Capacity and Memory Failover capacity are at 98% CPU, 99% memory. Well above the 50% needed for admission control to kick in and say STOP POWERING ON VMs!

Now that I've laid out the foundation, we've allocated 60 of the 64GB physical memory to VMs, and vSphere alarms are stating "hey you're out of memory", however, using a product such as VeeamOne, it reports that each of those 10 VMs are truly only utilizing 2GB of physical host memory of the allocated 6GB. So "technically" we're only utilizing 20 GB(10GB/host) of 64GB of physical host memory available in the cluster.

The million dollar question is...if I fail one of the hosts, will ALL VMs come back online on the last surviving node?

Scenario 1:

Host 1 32GB Physical Memory, 5 VMs, 30GB Memory Allocated, 93% utilization alarms

Host 2 32GB Physical Memory, 5, VMs, 30GB Memory Allocated, 93% utilization alarms

Host 1 fails, HA kicks in, is it going to state it cannot power on any more VMs because there's only 2GB remaining and I need to power on 30GB worth? (I would expect this isn't the case we should be able to over provision...)

Scenario 2:

Host 1 32GB Physical Memory, 5 VMs, 30GB Memory Allocated, 10 GB Physical Used per Veam One Reports, 20GB Physical Remaining

Host 2 32GB Physical Memory, 5, VMs, 30GB Memory Allocated, 10 GB Physical Used per Veeam One Reports, 20 GB Physical Remaining

Host 1 fails, HA kicks in, it boots all 5 VMs on host 2 as it has 20 GB Physical remaining and only needs 10GB?

It seems like there should be a better way to truly tell if you can fully support a 1 node failure in a % based model with over allocated memory in your cluster, can anyone debunk this for me and explain?

homerzzz · ‎01-29-2015

There are many metrics that are used to plan out capacity. For memory, right or wrong, I look at active memory and demand. Also, if I see ballooning or swapping, I feel its time to add a host (or just memory...all depends on several other metrics). DRS seems to start complaining when the hosts are stressed. Right sizing the VMs from the start makes it easier to trend and plan additional capacity. There should be no need to over allocate so much.

I also use VROPS in my environment. This is very helpful for trending and planning ahead.

View solution in original post

homerzzz · ‎01-29-2015

I decided to give this a try. I configured a two host cluster with 96GB RAM each. Configured the Admission control policy for 50% and have a mix (6 VMs) of Windows and Linux VMs totaling 124GB RAM. The VMs active memory is 73GB and the HA failover capacity shows 99% available. So this is your scenario but even more extreme. The VMs were split across the two hosts.

To test, I just hard powered off one host and all 3 VMs that were on that host were powered on the single remaining host. So with no memory reservations, HA will power them on the remaining host.

...and they are still running!

Vinsane · ‎01-29-2015

Very interesting, thank you for conducting that experiment. I wish I had my own lab to do so, but I don't yet.

So it appears we've proven that memory over-allocation works fine during in HA event.

You were able to run VMs totaling 124GB RAM allocated on a single surviving host with only 96GB physical memory, assuming this is because it's smart enough to know the VMs only physically need 73GB physical memory from the host.

Which brings me to my main point, it seems like the vSphere host memory alarm alert is sort of a false positive, it's showing 93% capacity as the reading appears to be what's allocated but not technically active by VMs.

In a Percentage Based Admission Control Policy, at what point or what alarm or metric should be monitored for someone to finally be able to say "okay memory is too over allocated and we cannot support a 1 host failure"

With no reservations, it sounds like not until you get the CPU Fail over / Memory Fail over capacity numbers down to your set percentage. This would be very high over commitment needless to say because with no reservation default CPU reservation is 32Mhz and for memory I believe it's just the overhead counted.

What I am trying to get at is how to determine when it's time to truly buy another host for your cluster as well. I have production clusters all showing % Memory 75% utilized, however, the actual active memory used in the cluster say with 1TB memory is only 250GB. I don't think that truly justifies buying another host.

homerzzz · ‎01-29-2015

There are many metrics that are used to plan out capacity. For memory, right or wrong, I look at active memory and demand. Also, if I see ballooning or swapping, I feel its time to add a host (or just memory...all depends on several other metrics). DRS seems to start complaining when the hosts are stressed. Right sizing the VMs from the start makes it easier to trend and plan additional capacity. There should be no need to over allocate so much.

I also use VROPS in my environment. This is very helpful for trending and planning ahead.

Vinsane · ‎01-29-2015

Thanks for the reply, I am in agreement and I think I have a much better understanding now. Below are some more resources I read in to as well to further clarify.

http://www.yellow-bricks.com/2013/01/09/percentage-based-admission-control-policy-rules-out-large-vm.../

HA Percentage based admission control from a resource management perspective – Part 1 - frankdennema...

HA admission control is not a capacity management tool. - frankdenneman.nl

http://frankdenneman.nl/2011/01/20/setting-correct-percentage-of-cluster-resources-reserved/

http://www.joshodgers.com/2013/03/09/example-architectural-decision-vmware-ha-percentage-of-cluster-...

https://communities.vmware.com/servlet/JiveServlet/previewBody/21181-102-1-28328/vsphere-oversubscri...

All

vSphere 5.5 HA Percentage Based Admission Control Policy