VMware Cloud Community
limey36
Enthusiast
Enthusiast

HA Configuration

I've walked into an existing site and am trying to configure 2 nodes in an HA cluster. DNS is good, shared storage is good, network is good, licensing is good, however I'm getting the following errors after I create the cluster:

\- Insufficient resources to satisty HA failover on cluster

\- Unable to contact a primary HA agent in cluster.

Any help would be greatly appreciated!!!

Thanks!

0 Kudos
7 Replies
letoatrads
Expert
Expert

Had some issues myself when I first setup HA. Here is one angle that could be causing it. Make sure PortFast ( Cisco ) or your switches equivalent is turned ON. Reason being, the HA stuff is fairly quick after the reboot of a host and I've had problems with hosts that don't have PortFast on not activating their NIC's quick enough, and the HA time out gets exceeded and hence the cluster doesn't reform properly unless you do a 'Reconfigure for HA'.

Also, you might want to try this after making sure your switches are set ok.

You can do this from the console of your ESX Host to verify DNS is good from an HA standpoint ( just checking)

/opt/LGTOaam512/bin/ft_gethostbyname YourServerName

There should just be 1 static IP address for each host

Then do the following.

Remove both hosts from VC - that should uninstall vpxa and the AAM agent from the host along with any configuration files (make sure that the /opt/LGTOaam512 directory doesn't exist or is empty). Making sure this directory is gone will let you get a fresh install of the HA Agent from VC. Then re-add them to the HA-enabled cluster. Hopefully there is no HA error at this point.

I know its some serious hoops, but I had a bear of a time myself in Beta with this so I know you pain. Good luck.

0 Kudos
limey36
Enthusiast
Enthusiast

As I suspected I missed something small - didn't have the short name in the /etc/hosts file - DNS isn't configured at this site - problem solved.

Thanks anyway!

0 Kudos
dilipdhure
Contributor
Contributor

That is one thing you have to do for HA, but besides that you also need to provide the gateway IP address for the ESX Server while installing or later in the ethernet network script.

Generally Gateway IP is the IP address of your virtual center server.

0 Kudos
mrbrown66
Contributor
Contributor

I fixed this issue by rebooting the virtual center service.

I originally encountered the issue due to an upgrade to VC 2.0.1 Patch 2, which overwrote the database (I selected the wrong option). I then restored the databased back to pre Patch 2 level.

This resulted in the virtual center agents being a patch ahead of the database. The servers would not readd back into virtual center. I tried reupgrading the database to patch 2, but this didnt fix the issue.

In the end the simplest solution was best. Remove all disconnected ESX Hosts. Reboot Virtual Center and readd ESX Hosts.

I then had issues with the HA Agent. I had to remove HA / DRS from the cluster, and then readd. Sorted.

0 Kudos
ua988180
Contributor
Contributor

Removing the LgtoAAM folder doesn't help, when host is added into the cluster again, the LgtoAAM folder is not created automatically Smiley Sad

0 Kudos
jjgunn
Enthusiast
Enthusiast

I know this sounds too simple but I'll explain what I did.

I had a cluster of 8 ESX servers. I was upgrading one at a time from 3.0.1 to 3.5 Update 2.

Everything was going well and HA was configuring automatically with no issues until I upgraded the final server in the cluster. That's when the HA wouldn't configure properly. Red exclamation marks on everything with the error message "unable to contact a primary ha agent in cluster". Below is the solution which worked for me.

I simply right-clicked the cluster itself and selected "Edit Settings" then UNCHECK Enable HA. '

Allow the cluster time to UNconfigure HA. It took my cluster of 8 ESX servers about 10 minutes to complete. Once the unconfiguring was completed, I waited a little while longer (another 10 minutes or so)

Then I right-clicked the cluster and "Edit Settings" again to ENABLE HA. This time it only took about 5 minutes and the entire cluster is operating properly again.

Hope this is helpful for someone else. :smileycool:

0 Kudos
Brad_C
Contributor
Contributor

I know this is an old post and is answered, but i just ran into this problem myself on my first HA setup. It appears to be related to the hostname of the hosts. I have lost the URL that I found the information at but here is most of it that I recall.

(1) check that the esx hostname in the VI client is all lowercase (DNS and Routing on Configuration tab)

(2) ssh to all hosts in cluster and check the value returned with the "hostname" command. If it is not FQDN in all lowercase, change it to such using hostname command.

(3) ssh to all hosts in cluster and check that /etc/sysconfig/network has the hostname= FQDN in lowercase

(4) disable HA and DRS on cluster and then reenable

This fixed my problems. I had two hosts in a cluster, 64GB memory total, 42GHz CPU total, and 5 VMs running with max memory allocated to a single VM at 4096MB, but was still getting this error.

Regards,

Brad C.

0 Kudos