HA Failover Capacity - What Do The Numbers Really ...

jbiggley · ‎05-27-2011

We have a 10 host cluster run ESX 4.1 and vCenter. I recently switched the Admission Control Policy from a specific number of hosts to a percentage of the resources in the cluster. Doing so, I noted the following values on the cluster summary tab:

Current CPU Failover Capacity: 88%

Current Memory Failover Capacity: 95%

Configured Failover Capacity: 10%

I know that the math is (Total CPU Resources - Required CPU Resources) / Total CPU Resources. (The same for memory). It seems that the numbers are quite high, though we don't use very many resource reservations for the VMs. I understand that CPU is reservation + 256 MHz (even if the reservation is 0) and memory is reservation plus memory overhead for the VMs plus kernel overhead. (VM overhead table is in the vSphere Resource Management Guide

Table 3.2). I guess my questions are as follows:

1. Without reservations in place, are the current failover capacity numbers really all that useful for gauging the health of the cluster?

2. It was suggested on another post that the failover capacity was the percentage of hosts/VMs that could be restarted in the event of failure. That seems absolutely insane and completely unrealistic, but I wanted to throw that out to quash that misinformation.

3. In order to best use the failover capacity, should we consider implementing memory and CPU reservations for VMs?

Just to save everyone some typing, I've read the stuff over at Duncan Epping's YellowBricks.com and Frank Denneman's blog on HA. To me, it seems that, without reservations, unless I am getting close to my configured failover capacity that I don't really have anything to worry about.

Quick, sanity check!

jbiggley · ‎06-02-2011

I was reviewing my notes from the VCAP-DCD class I took in January and I think I've answered my own question, but could still use a sanity check.

1. No, current failover capacity is not useful without reservations.

2. Failover capacity has nothing to do with the percentage of VMs that can be failed, and is only related to the percentage of resources not guaranteed to VMs through reservations

3. Yes, we should consider implementing reservations. Since we have converted to percentage failover we don't have to worry about slot size (part of the reason we converted), but reservations should still be realistic. Reservations for memory should be the average active memory for a particular VM, unless there is a specific need to reserve the assigned RAM. In our case, our cluster is not CPU constrained (far from it) so we will not implement any CPU reservations at this time.

Again, can someone sanity check my assumptions here.

russ79 · ‎08-17-2011

I think i'm facing the same issues, no reservations = HA is incorrect, the problem is that it affects DRS in maint mode situations... see my post: http://communities.vmware.com/message/1812423

Did you go ahead and make reservations to correct the problem? I'm thinking about doing the same but am hesitant b/c it's a ton of manual work so to speak

AndreTheGiant · ‎08-17-2011

See: http://www.yellow-bricks.com/vmware-high-availability-deepdiv/

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro

russ79 · ‎08-17-2011

so i guess this is tied to my lack of reservations, HA is using the default minimum reservation for each VM, is there an easy way to get avg memory utilization from a VM in order to set a proper reservation?

All

HA Failover Capacity - What Do The Numbers Really Mean?