i have a problem with the HA agent on one of our esx hosts, problem manifests itself with the one esx host reporting an error with HA as follows.
error 03/12/2007 10:39:52 HA agent on esx06.icec.local in cluster Production in York has an error
error 03/12/2007 10:38:52 HA agent on esx06.icec.local in cluster Production in York has an error
this continues throughout the day but does not really manifest any real error other than the ones above.
Little history about the environment
we have 6 esx hosts running esx server 3.0.2,61618 in a cluster, HA is enabled and set to failover if a maximum of 2 hosts fail. ip connectivity is working and dns is working successfully.
I have checked the host in question and looked for the log files related to the HA agent and could not find the opt/LGTOaam512/log directory, it is not present. could this be an indication as to the problem.
I have checked all other hosts and the directory is present. can anyone tell me how i should proceed as linux is not really an area i am comforable with yet?
Does this error appear on a regular basis, i.e every 3 minutes, and then it resolves itself and then reoccurs?
If so the fix i have found to work is when the ESX hosts is not erroring (i.e does not have the red alert icon) right click on it and try "reconfigure for HA"
let me know
HA is very sensitive to DNS issues. Make sure that everything in the cluster is properly registered in DNS.
Also, you may want to make sure your \etc\hosts file contains every ESX host with both it's long and short name
My hosts file is setup as follows
184.108.40.206 host1.domain.tld host1
220.127.116.11 host2.domain.tld host2
Sorry for the late reply
i have tried that and the error still occurs every so often, with regards to dns, currently each esx host only has itself in the there host file are you suggesting that i need to add all esx hosts into all of the esx host files?
yes you may want to try to add an entry for each host in each host files...
This is related 90% of time to DNS or time issue..
Check time between host to make sure this is also correctly set.