VMware Cloud Community
romierome
Contributor
Contributor
Jump to solution

HA agent error

I have to host in a cluster. At one point I had HA and DRS enabled for the cluster. I now only have DRS enabled. However, one of the hosts in the cluster is still getting a "HA agent on......in cluster.....has an error".

Trying to get rid of error message.

0 Kudos
1 Solution

Accepted Solutions
emmar
Hot Shot
Hot Shot
Jump to solution

I had a similar situation - had a cluster with just HA enabled - then i disabled it but one of the Hosts timed-out on the agent uninstall.

So the cluster had HA disabled and the ESX host did not have the option to "reconfigure HA" (as if it wasnt installed) BUT the host had an HA agent error ... i tried rebooting - removing it from the cluster etc The only way i could get rid of it was to Enable HA on the cluster and then disable it again.

View solution in original post

0 Kudos
8 Replies
VirtualNoitall
Virtuoso
Virtuoso
Jump to solution

Hello,

Have you tried removing the troublesome host from the cluster and adding it back?

Have you confirmed that fully qualified domain name resolution is working between all nodes in the cluster?

0 Kudos
MR-T
Immortal
Immortal
Jump to solution

This is a total pain and something I've seen a lot.

If you right click the host in virtualcenter and select reconfigure for HA, the error will go away.

Why do we get these errors?

I've read posts saying it's name resolution or network traffic issues, but I'm not so sure.

All my servers have full DNS resolution working and the network uses a dedicated gigabit link.

Anyway, until it's fixed, just do the above.

0 Kudos
jasonboche
Immortal
Immortal
Jump to solution

The HA agent is a fickle devil.

The issue is typically with the ESX host not being able to resolve both short and long names of other hosts on the network.

You can ensure that both short and long host names are resolvable by placing them in the /etc/hosts file

ie.

127.0.0.1 localhost.localdomain localhost

10.0.0.1 server1.domain.com

10.0.0.1 server1

10.0.0.2 server2.domain.com

10.0.0.2 server2

etc...

It is unfortunate to have to do this, but if from the ESX console you cannot ping both short and long names, then DNS is your issue.

Message was edited by:

jasonboche

It also bugs me to no end that anything but a FQDN needs to be resolvable. Why the need to resolve a short host name? Is ESX a Windows OS now requiring WINS or NetBIOS name resolution? I don't think so.

VCDX3 #34, VCDX4, VCDX5, VCAP4-DCA #14, VCAP4-DCD #35, VCAP5-DCD, VCPx4, vEXPERTx4, MCSEx3, MCSAx2, MCP, CCAx2, A+
0 Kudos
emmar
Hot Shot
Hot Shot
Jump to solution

I had a similar situation - had a cluster with just HA enabled - then i disabled it but one of the Hosts timed-out on the agent uninstall.

So the cluster had HA disabled and the ESX host did not have the option to "reconfigure HA" (as if it wasnt installed) BUT the host had an HA agent error ... i tried rebooting - removing it from the cluster etc The only way i could get rid of it was to Enable HA on the cluster and then disable it again.

0 Kudos
romierome
Contributor
Contributor
Jump to solution

That did it, Thx

0 Kudos
admin
Immortal
Immortal
Jump to solution

One thing that I ran across, that might affect service console networking, and thus have a negative impact on HA - if "Serial over LAN" is enabled in your environment -

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1627&slice...

One more thing to keep in mind.

0 Kudos
nheidler
Contributor
Contributor
Jump to solution

I ran into exactly this issue. DNS was 100% on each of the 3 ESX servers, VC and in Microsoft DNS.

Hosts file on the ESX servers had the long name:

10.1.1.100 esx1.thedomain.com

but not the short

10.1.1.100 esx1

added in the short names to each of the hosts files on the ESX servers, no reboot, and now HA works.

0 Kudos
jsinclair
Enthusiast
Enthusiast
Jump to solution

Same problem here, but my resolution was a bit different. Using ESX 3.5, DNS was setup correctly, but the time wasn't. Synced all ESX servers, and VC server with a local NTP server and HA errors disappeared.

0 Kudos