4 Replies Latest reply on Dec 11, 2012 10:17 AM by atmosphere23

    Admission Control - [Sanity Check] Unbalanced cluster w/ no VM reservations

    atmosphere23 Lurker

      I've been spending time reviewing Adminission Control policy and just looking for a "sanity check" that my calculations and findings in my environment are accurate.  Currently managing an unbalanced cluster which includes 1 host significantly larger in memory than the rest (host 5).  I've read from multiple sources including the 5.1 deepdive from Duncan and Frank that the "host failures the cluster tolerates" policy will waste resources in the scenario to ensure the largest host is protected.  However, the math below shows that this policy is almost dead-on with what the percentage-based policy would cover in a N+1 scenario (again ensuring the largest host is protected).

       

      The breakdown is:

       

       

      Host 1

      Host 2

      Host 3

      Host 4

      Host 5

      Host 6

      Host 7

      Totals

      Memory (GB)

      48

      72

      64

      64

      128

      64

      72

      512

      Memory %

      9.38

      14.06

      12.5

      12.5

      25

      12.5

      14.06

      CPU

      18.08 GHz

      19.12 GHz

      19.12 GHz

      18.08 GHz

      19.12 GHz

      19.12 GHz

      19.12 GHz

      131.76

      CPU %

      13.78

      14.5

      14.5

      13.78

      14.5

      14.5

      14.5

       

      In a N+1 scenario the recommended percentages would work out to 25% for memory (128 / 512) * 100 and 15% (rounded up) for CPU (19.12 / 131.76) * 100 to ensure the largest host is protected.  I realize that we may reduce waste by lowering these values, but would ultimately defeat the point of HA if the largest host failed (granted I realize again it would need to be heavily utilized) and there were not enough resources for HA initiated restarts.

       

      The current used policy of host failures the cluster tolerates slot sizes are the default values (32 MHz for CPU and 217MB (overhead) for memory) with no VM reservations.  We do utilize RP reservations, but they don't come into play with HA slot size calculation if I understand correctly.  We currently have 2210 total slots (1540 available) in our main cluster given the hosts/resources provided.  138 slots are used for powered-on VMs and 532 are reserved for failover.  For memory, this works out to be 112.73 GB (532 * 217) / 1024).  That equates to 22% of the total memory in the cluster.  For CPU it works out to be 16.63 GHz (32 MHz * 532) / 1000 which rounded is 17% of the total CPU in the cluster.

       

      So if I switched to the percentage-based policy with best practices in mind I would reserve more memory and less CPU than current, but only slightly given the percentages above.  Are there any underlying issues with this configuration I'm missing or given the configuration is my analysis accurate? Or would it be recommended to implement the percentage based policy and lower the reserved resources to values lower than the largest host in the cluster to gain available resources?  It's "catch 22" because we obviously want to maximize available resources in the cluster while ensuring adequate failover resources.