VMware Cloud Community
ckboon
Contributor
Contributor

HA disabled on one of the ESX host

Hi,

I have 3 ESX hosts in a cluster. 1 of them have a problem with the HA agent. I tired to "reconfigure for HA". It will configure successfully but it will disable it a short while later. PING to & from all 3 machines are ok, by name or IP. License seems to be ok. I can't figure out what's wrong with it.

All ESX hosts have local hosts file as well as resolving names via the DNS server running on the VC. VC also has a local hosts file. All hosts files are updated as well.

I don't know what else to check.

Thanks for the help.

Reply
0 Kudos
6 Replies
Gerrit_Lehr
Commander
Commander

Please to disable and re-enable HA for the cluster and remove and add the Host to the cluster again.

Kind Regards,

Gerrit Lehr

If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".

Kind regards, Gerrit Lehr If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
jdvcp
Enthusiast
Enthusiast

And if that fails, the problem could be deeper:

  1. ensure that all esx servers in the cluster have correct info in /etc/hosts regarding fqdn and netbios name. All hosts should be able to ping all other hosts by both methods.

  2. ensure your service console has enough memory allocated. We found that our HA agent was crashing and/or a runaway process in a buggy version of 3.0.1 HA was causing the service console to run out of memory. In this case, HA and other services crashed an restarted periodically.

The above are the less likely, but totally possible level 2 or 3 troubleshooting scenarios. Let me know if you need more detail.

donaldmickey
Contributor
Contributor

check your network configuration (bridge, netbios, DNS, IP address)

Reply
0 Kudos
depping
Leadership
Leadership

Also checkout /etc/ft_hosts , it contains a copy of the /etc/hosts file which isn't always updated. If you can't find a thing there are a couple of things you can try:

1 remove the host from the cluster and add it again

2 reboot the host

3 up the SC memory to around 800 (just to be surem as jdvcp mentioned)

Duncan

My virtualisation blog:

Reply
0 Kudos
ckboon
Contributor
Contributor

I can't find /etc/ft_hosts in all 3 servers.

I have rebooted my ESX host, and it's still not working. There are more problem which I will start a new thread.

Reply
0 Kudos
ckboon
Contributor
Contributor

Found the problem and why the stupid hostd keeps dying.

I made a backup copy of the "/etc/vmware/firewall/services.xml" in the same directory. So hostd / firewall keeps trying to load both copies.

Removed the backup copy, restart "S98mgmt-vmware".

HA is now working again, VI Client control & access, etc. etc.

Duh!

Thanks to everyone who had given suggestion.

Reply
0 Kudos