I came in this morning with a message on a cluster that HA fail over was in progress even though it appeared that all hosts and guests were up.
I followed this link to get rid of the message: VMware Knowledge Base.
Unfortunately configuration of HA on the hosts when HA is re enabled times out.
I then found this article Reconfiguring HA (FDM) on a cluster fails with the error: Operation timed out (2008609) | VMware KB
I started the HA agent on all the hosts and enabled HA on the cluster and all of the hosts except one time out. One is elected Master.
Any other ideas on how to get HA functional again?
On the host that is timing out, can you look in /var/log/fdm.log and /var/log/hostd.log
You can also try and right-click the host and select "Reconfigure for vSphere HA"
I've tried the individual reconfigure and that fails too. out of 12 hosts one has been elected master 2 are initializing and it says 9 are unreachable.
I'll take a look at the logs.
If nothing helps... go to the cluster and disable HA in total, activate it again and in all my cases with the timeout error this helps. Most important is that all settings are restored. This is a different behaviour compared to the earlier years.
Do check with your network guy if any changes have been performed on network switch(es) where these ESXi hosts are connected to. It is possible hosts are not able to communicate with each other due to which HA seems to have gone crazy.