parasam
Contributor
Contributor

Insufficient resources to satisfy HA failover level on Cluster and Unable to contact a primary HA agent in Cluster

Insufficient resources to satisfy HA failover level on Cluster

Unable to contact a primary HA agent in Cluster

I walked in tthis morning and I have a RED EXCLAIMATION on my CLUSTER ....

I go into the forums and it says to stop HA .... and then restart it.   Time to restart is everything from 3 to  30 minutes

I do it ... and now I have RED EXCLAMINATIONS on all my physical Servers in the Cluster.

Please advise

14 Replies
vmroyale
Immortal
Immortal

Hello and welcome to the communities.

Note: Discussion successfully moved from VMware ESX™ 4 to Availability: HA & FT

Brian Atkinson | vExpert | VMTN Moderator | Author of "VCP5-DCV VMware Certified Professional-Data Center Virtualization on vSphere 5.5 Study Guide: VCP-550" | @vmroyale | http://vmroyale.com
0 Kudos
vGuy
Expert
Expert

As a start, plz ensure your DNS isworking fine and the ping between the hosts and also the vCenter works with IP, shortname and FQDN.

then try to disable HA on the cluster and reenable it back.

if the issue still persists, check your adminssion control policy and if you can, try to disable it temporarily to see if the error goes away.

also verify that you do not have VMs configured with high cpu/mem reservations.

0 Kudos
parasam
Contributor
Contributor

Under "HA Advanced Runtime Info".....  I have Total good hots in Cluster   0

and a message in Clluster Operational Status:  HA agent on ardesxx11.etc.lan in cluster ARD Cluster 2 "Company Name" has an error:  NoPrimaryAgentAvailable

how do I get this agent to restart or go into operational status ???

Please see the attached file

Please advise

0 Kudos
vGuy
Expert
Expert

the total good hosts is showing as 0 since none of the host is HA operational.

what is the version of esx(i) host and vCenter you're running?

ensure your DNS is working properly.......prior vSphere 5, HA had a huge dependency on the name resolution

once DNS is confirmed, you can reinstall the HA agents by right clicking the Cluster --> edit settings --> Uncheck DRS/HA checkboxes --> OK.

once the process complete, go back and reenable HA (by selecting both the checkboxes).

let's know how it goes..

0 Kudos
joshodgers
Enthusiast
Enthusiast

Ensure your ESXi management default gateway is reachable (and responds to ICMP), ensure DNS etc is also working.

Check your HA admission control policy, and if your using "Host failures cluster tolerates" you may be suffering from a common issue due to the way this setting calculates available capacity, using "slot" sizes.

The "slot" size is by default the biggest virtual machine in the cluster, so the bigger your biggest machine, the less "slots" you will have.

If you change the admission control policy to "Percentage of cluster resources reverved for HA" and set a percentage value, you will not be limited by "slot" size. Note: Always leave "HA Admission control" Enabled.

A suitable percentage to set, is as follows

2 hosts - 50%

3 hosts - 33%

4 hosts - 25%

5 hosts - 20%

and so on for N+1

If you want N+2

2 hosts - N/A

3 hosts - 66%

4 hosts - 50%

5 hosts - 33%

There is no risk changing your admission control policy, so this can be tested during production, and you will likley see your cluster return to a normal status.

This setting also typically allows for greater consolidation ratios, while still maintaining excellent performance.

Hope that helps.

Josh Odgers | VCDX #90 | Blog: www.joshodgers.com | Twitter @josh_odgers
depping
Leadership
Leadership

first thing you should try, on each host:

right click --> reconfigure for HA

0 Kudos
MauroBonder
Leadership
Leadership

second, disable and re-enable HA

*Please, don't forget the awarding points for "helpful" and/or "correct" answers. *Por favor, não esqueça de atribuir os pontos se a resposta foi útil ou resolveu o problema.* Thank you/Obrigado
0 Kudos
krishna_v78
Enthusiast
Enthusiast

Parasam, Are you able to fix the issue? If yes, pl mark this thread as Answered. It will help others who may encounter same issue.

0 Kudos
tsc2
Contributor
Contributor

Thanks by switching from 'Host failures the cluster tolerates' to 'Percentage of cluster resources reserved as failover spare capacity' seems to have cleared the issue on my system.

depping
Leadership
Leadership

tsc2 wrote:

Thanks by switching from 'Host failures the cluster tolerates' to 'Percentage of cluster resources reserved as failover spare capacity' seems to have cleared the issue on my system.

Which means you probably have a large reservation on one of your VMs. I would recommend figuring out which VM that is and to revise if that is needed or not or is there by mistake.

ravitabhjoshi
Contributor
Contributor

Disable HA & Reconfigure

0 Kudos
WildBill1952
Contributor
Contributor

I tried all of these answers and they got me going in the right direction to solve the same issue. The final piece was going into Task & Events and reading all of the event messages. One of the messages noted that one of my hosts was not configured the same as the other. After correcting this, and then turning off HA and turning it back on, everything came up clear.

0 Kudos
chrisciano
Contributor
Contributor

In case you were still wondering about this... below is a great link to get this resolved:

VMware: Fixing Insufficient resources to satisfy configured failover level for HA | geekswing.com

Enjoy
Smiley Happy

0 Kudos
SelahattinOkur
Contributor
Contributor

Hi,

I had "insufficient resources to satisfy vsphere ha failover level on cluster xxx " error right after some network link problems ,

I searched the net and found "VMware KB:    Troubleshooting VMware High Availability (HA) in VMware vSphere"

and I tried first the 11th line "Verify the VirtualCenter Server Service has been restarted. To restart the VirtualCenter Server Service, see http://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=1...Stopping, starting, or restarting vCenter services (1003895)."

I logged off the Web client, restarted that service and then restarted the  Web Client and everything was normal.

0 Kudos