Hello,
since a few days one of my ESX node in an HA-cluster loggs:
"HA agent has an error"
i tried to reconfigure HA but this seems to "fix" the problem just for a few hours.
Where to start looking for the reason?
How can i debug HA ?
DNS seems fine, all servers are forward and reverse resolved corretly.
How many hosts are in the cluster?? All same version of VI3?? any info in the Task and Events of that host.
You can also try under the HA settings to "Allow Virtual Machines to be powered on even if they violate availability constraints" Just to see if that gives you a different error or works.
There are four hosts in the cluster "esx1".."esx4"
Alle same version.
No further info in the events (seen in virtual center)
"Allow Virtual Machines..." is already checked.
Do you see any errors on the hosts themselves?? Does it actually let you configure HA and then give you the error afterwards, or does not let you configure HA at all??
You can check in /opt/LGTOaam512/log/ and see if you can spot any errors in the logs.
It let's me reconfigure HA without any problem (the attention-sign disappears).
This state last for a while before it comes back again without any known reason.
You can check in /opt/LGTOaam512/log/ and see if you
can spot any errors in the logs.
Puuh, there are about a zillion files in there, where do i start ?
aam_config_util_listnodes.log:
KEY: -z VAL: 1
KEY: domain VAL: vmware
KEY: cmd VAL: listnodes
CMD: hostname -s
RESULT:
\----
esx1
CMD: /opt/LGTOaam512/bin/ft_gethostbyname esx1 |grep FAILED
RESULT:
\----
list_nodes
CMD: /opt/LGTOaam512/bin/ftcli -domain vmware -connect esx4 -port 8042 -timeout 60 -cmd listnodes
RESULT:
\----
Node Type State
\--
esx1 Primary Agent Failed
esx2 Primary Agent Running
esx3 Primary Agent Running
esx4 Primary Agent Running
Do you have your hosts files set up with with the short name and the FQDN?
Even if your DNS works...HA for some odd reason needs the hosts files on all ESX servers to work properly.
Also Patch 2 of VC 2.0.1 \*seamed* to clear up some other HA problems.
I just had a similar problem. 2 things that I had to do.
1. reboot the offending server (this fixed my ha agent continually disabling itself)
2. create a new ha cluster group and move all esx servers into it.
2. create a new ha cluster group and move all esx
servers into it.
This seems to work. For about 2 hours i don't get any HA related errors.
Thank you very much !