Is this bug back, what am i missing.
This is a vsan stretched cluster running 6.7u1.
HA Admission Control configuration can be seen in the pics attached. I am using cluster resource % (50) as recommended for vsan stretched clusters. The weird thing is on the HTML interface the Host Failure option is available but in the vSphere web client is not (which I think is how it should be on the HTML), not sure if this is a bug/issue just letting you know.
I have plenty of resources, as you can see in the picture attached... I can lose half of the cluster and still will have plenty to support my operation. Yet, the "Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster" warning is still showing.
What did i do wrong or am not understanding.
PS: I do have like 3 Vms with "small" cpu/mem reservations.
But I have enough resources to tolerate half the cluster going down or so I think.
I think I understand what this setting does and I don't want to have a performance impact in the event of a failure so that is why I set it to 0.
My Active memory utilization is really low (read here this is the metric used ). Cpu utilization is low too. see images attached
Thanks for the quick reply!
you might have enough resources to tolerate failures, but you still don't have enough resources to allow VMs to carry on with same performance, that's what that message is trying to say.
if you set that number to 100% then the warning you see will go for sure, but with 0% it's there. if you want to find out ideal number then you might have to try 25% or 50% or 75% etc to see if warning goes away.
when you are calculating available resources, consider total amount of cpu and memory reservation you have done and if 50% resource is lost, then that total reservation is still fully committed resource to corresponding VMs from your remaining available 50%, which then reduces resources for rest of the VMs and they might suffer during their peak utilization and start contending on rest of the available unreserved capacity.
Hope this helps.
I will try increasing the number as suggested, but I am also trying to understand the logic so that I can apply it to my capacity planning (At this point I have no clue as to what resource is causing the alarm to go off, memory seems the most obvious). I have tried reading multiple articles including the latest clustering deep dive book and I cant still grasp this HA admission control logic.
Anyways, here is additional info on my cluster:
My VM mem reservation add up to 40GB and cpu to 41200 MHz, like I said small...My total cluster resources 1.34 THz and 10.48 TB Memory. Based on the information provided in the article I linked in a previous reply and assuming the algorithm uses active memory (not consumed = used = active + overhead) everything tells me I have enough even to set a 0% degradation and be ok after half my cluster goes down. Please see a screenshot of my cluster resources attached.
Again, thanks for taking the time and for your quick response!
How many hosts in cluster do you have?
You have configured that you tolerate failure of 7 hosts. Can you set it for a while on e.g. 4 hosts and check if warning disappear? Just to check if it is connected with this option.
And second part:
Give us a info about current CPU and Menory reservations.