I have to host in a cluster. At one point I had HA and DRS enabled for the cluster. I now only have DRS enabled. However, one of the hosts in the cluster is still getting a "HA agent on......in cluster.....has an error".
Trying to get rid of error message.
I had a similar situation - had a cluster with just HA enabled - then i disabled it but one of the Hosts timed-out on the agent uninstall.
So the cluster had HA disabled and the ESX host did not have the option to "reconfigure HA" (as if it wasnt installed) BUT the host had an HA agent error ... i tried rebooting - removing it from the cluster etc The only way i could get rid of it was to Enable HA on the cluster and then disable it again.
Hello,
Have you tried removing the troublesome host from the cluster and adding it back?
Have you confirmed that fully qualified domain name resolution is working between all nodes in the cluster?
This is a total pain and something I've seen a lot.
If you right click the host in virtualcenter and select reconfigure for HA, the error will go away.
Why do we get these errors?
I've read posts saying it's name resolution or network traffic issues, but I'm not so sure.
All my servers have full DNS resolution working and the network uses a dedicated gigabit link.
Anyway, until it's fixed, just do the above.
The HA agent is a fickle devil.
The issue is typically with the ESX host not being able to resolve both short and long names of other hosts on the network.
You can ensure that both short and long host names are resolvable by placing them in the /etc/hosts file
ie.
127.0.0.1 localhost.localdomain localhost
10.0.0.1 server1.domain.com
10.0.0.1 server1
10.0.0.2 server2.domain.com
10.0.0.2 server2
etc...
It is unfortunate to have to do this, but if from the ESX console you cannot ping both short and long names, then DNS is your issue.
Message was edited by:
jasonboche
It also bugs me to no end that anything but a FQDN needs to be resolvable. Why the need to resolve a short host name? Is ESX a Windows OS now requiring WINS or NetBIOS name resolution? I don't think so.
I had a similar situation - had a cluster with just HA enabled - then i disabled it but one of the Hosts timed-out on the agent uninstall.
So the cluster had HA disabled and the ESX host did not have the option to "reconfigure HA" (as if it wasnt installed) BUT the host had an HA agent error ... i tried rebooting - removing it from the cluster etc The only way i could get rid of it was to Enable HA on the cluster and then disable it again.
That did it, Thx
One thing that I ran across, that might affect service console networking, and thus have a negative impact on HA - if "Serial over LAN" is enabled in your environment -
One more thing to keep in mind.
I ran into exactly this issue. DNS was 100% on each of the 3 ESX servers, VC and in Microsoft DNS.
Hosts file on the ESX servers had the long name:
10.1.1.100 esx1.thedomain.com
but not the short
10.1.1.100 esx1
added in the short names to each of the hosts files on the ESX servers, no reboot, and now HA works.
Same problem here, but my resolution was a bit different. Using ESX 3.5, DNS was setup correctly, but the time wasn't. Synced all ESX servers, and VC server with a local NTP server and HA errors disappeared.