Solved: HA admission control, no reservations

Tyrelever · ‎09-20-2010

Hi all,

I've been reading up trying to get my head around how the HA admission control functions with VM's that have no reservations set.

My understanding is the HA system assigns a default of 256Mhz CPU and 0MB+memory overhead to VM's that do not have reserved resources. This default then becomes the slot size in determining the amount of VM's the cluster can run. The number of slots is in turn affected by the number of host failures that can be tolerated or percentage of resources reserved as failover spare capacity.

Where I'm getting confused is what happens when HA assigns (assuming no VM reservations) the default slot size (256Mhz + (0MBoverheadMB) to a cluster where actual VM memory usage is higher than 0MBoverhead? I would assume that there is a possibility that the HA slot assignment could far exceed the actual/usable cluster resources in the event of failover/maintenance mode etc.

For example -

A cluster using 32GB hosts with a bunch of 1vcpu 4GB VM's with no reservations set. According to the resource guide each VM would have an overhead of 165.98MB, resulting in a rather large number of slots per host. Assuming CPU resources are sufficient, a host could have something like 190 slots to play with.

Does this mean that HA could possibly allow 190 unreserved 4GB VM's into the cluster before admission control would prevent further VM's to be powered on or added? If this is correct, it seems entirely possible to over commit the cluster to a very large degree defeating the point of assigning a host for failover or percentage of resource etc as each host could be running in excess of 100% utilization.

Assuming that all VM's are to remain unreserved, how could I setup HA admission control to monitor admissions (either hosts failures or percentage) without over saturating the remaining hosts?

EDIT - This link sums up my question a little more eloquently that I have put it, please note the article linked is from 2008 and figures maybe out of date.

Cheers.

link added

admin · ‎09-21-2010

As mentioned in the link you posted (and the VMware availability guide page 26 - ) you can use HA advanced options to increase the default values HA admission control uses to determine the slot size if no reservations are specified on the vms: das.vmMemoryMinMB and das.vmCpuMinMHz. This will increase the slot size and reduce the number of slots available in the cluster.

Elisha

View solution in original post

techsuresh · ‎09-21-2010

It’s actually a really simple mechanism. HA keeps track of the unreserved capacity

of each host of the cluster. When a fail-over needs to occur the hosts are ordered. The host with the highest amount of unreserved capacity being the first option. Now to make it absolutely crystal clear, HA keeps track of the unreserved capacity and it is not DRS which does this. HA works completely independent of vCenter and as we all know DRS is part of vCenter. HA also works when DRS is disabled or unlicensed!

Now one thing to note is that HA will also verify if the host is compatible withthe VM or not. What this means is that HA will verify if the VMs network is available on the target host and if the datastore is available on the target hosts. If both are the case a restart will be initiated on that host. Tosummarize:

1.Order available host based on unreserved capacity

2.Check compatibility (VM Network / Datastore)

3.Boot up!

Check out this link: http://www.yellow-bricks.com/vmware-high-availability-deepdiv/

If you fell, it is a correct answer please click on "Correct answer' button or if you feel it as a helpful answer ckick on "helpful" button to award points

Suresh

Tyrelever · ‎09-21-2010

Hi Suresh,

Please accept my apologies if I am mistaken, but Im not sure how this addresses the slots/% vs. actual cluster memory capacity problem.

The statement "It's actually a really simple mechanism. HA keeps track of the

unreserved capacity of each host of the cluster" seems to be where my confusion is mostly focused.

Am I correct in saying the unreserved capacity of each host is broken down into slots? If each slot is very small and numerous, can we not come to a situation where a cluster can become overloaded while still having HA slots available? Its the opposite problem to where you may have a large VM reservation preventing optimal cluster utilization by creating large slot sizes.

The deep dive is a great page but seems to focus more on the issue of large reservations and thier effects than no reservations and thier effects.

I have also edited my post with a link that seems to sum up my problem a little more clearly.

Thanks

admin · ‎09-21-2010

As mentioned in the link you posted (and the VMware availability guide page 26 - ) you can use HA advanced options to increase the default values HA admission control uses to determine the slot size if no reservations are specified on the vms: das.vmMemoryMinMB and das.vmCpuMinMHz. This will increase the slot size and reduce the number of slots available in the cluster.

Elisha

Tyrelever · ‎09-21-2010

Hi Elisha,

Thanks for the reply,

I was thinking that may have been the solution for this problem. So what needs to happen is a manual calculation of slot size based upon real memory usage that will need to be recalculated everytime the powered on VM count or VM ram usage changes?

That being said if I was to manualy adjust the default slot size, I should set the slot size to the largest VM memory allocation? I would then end up with the opposite problem of under utilisation due to large slot sizing.

Is there a calculation or method that vmware suggests for sizing slots and therefore HA admission control according to real memory usage instead of reservation+overhead?

Cheers

admin · ‎09-21-2010

Actual memory usage of a vm can vary greatly across vms and even for a given vm depending on the load. That makes it a bad candidate for usage in HA admission control which needs a more static estimation of a vms resource requirements. If possible you should assign reservations to vms based on the minimum amount of resources required for them to do their job reasonably. If you don't want to assign reservations, you should set the HA advanced options to some reasonable low ballpark value that will satisfy most vms. If your vms vary widely in their resource requirements, I'd recommend using the "percentage of cluster resources" admission control policy which doesn't use the "slot algorithm" and doesn't suffer from the problems of being too conservative if there are outlier vms with much larger resource requirements that most in the cluster.

Elisha

Tyrelever · ‎09-21-2010

Thanks for the advice Elisha, just a few more things if I may.

I understand that active/consumed memory is a very fluid metric to track for HA but is this not what should be tracked to maintain cluster resources in the event of a host failure/maint. mode?

How can we say with confidence that a highly utilized cluster has/does not have capacity to run all active vm's without memory impact in the event of failure if HA does not track memory usage? I guess this is more a conceptual question that could be outside the scope of this post but its where my train of thought keeps ending up at.

Regardless of which HA method (slot or %) is used there really is no way for HA to ensure all vm's can run without impact on a cluster (in the event of host failure etc) without manually setting the minimum slot size in the advanced HA section. This minimum in a sense is simply an arbitrary amount of slots or % that a cluster can tolerate before real memory usage is impacted.

For example.,

My 5 host cluster is running about 60% across all hosts, when I lose a host the remaining hosts run at about 75-80%. Im finding that im tuning the minimum slot size to ensure the cluster will only allow the admission of approx 10-15% more cluster load in terms of real memory resources. So my minimum slot size is somthing like 3072MB which provides approx 10 available slots. 10x 3072 equals approximately a 10% further real memory usage in the even of failure.

This also means that an admin must make a guess to how many slot / % is acceptable to ensure failover capacity. This minimum will also need to be adjusted as powered on VMs and/or in use memory changes.

To me it seems more than a little clunky that HA doesnt track a metric that is in effect the most important part of satisfying failover capacity, that is the ability to service real resources (CPU and memory) in the event of failover.

Finally tuning a average minimum slot size for use in a "% available" HA cluster, we could run into larger VM's not being able to obtain a host to run under due to smaller slot sizing?

I hope that all makes sense.

Cheers

Message was edited by: Tyrelever - Spelling

admin · ‎09-22-2010

Hi Tyrelever,

Thanks for your thoughtful feedback - that's a valid critique of some aspects of HA admission control. I'll be sharing your ideas with the HA team to see how we can improve things in the future. Regarding your last point - yes, a drawback of the "% of cluster resources" policy is that it doesn't consider resource fragmentation so in a highly utilized cluster large vms may not be able to be failed over in some corner cases.

Thanks

Elisha

Tyrelever · ‎09-22-2010

Thanks Elisha,

Your help is much appreciated.

Cheers,

Alistair

All

HA admission control, no reservations