Hello,
HA agent on host1 in cluster one has an error:HA agent on host failed
I receive the error in the subject line above. I checked the logs and the agent starts running again randomly. Sometimes in 3 seconds and
sometimes in just less then 2 minutes.
What I have is 3 hosts all using the same hardware
host1 has (vm1 and a virtual Vcenter running)
host2 has (vm2)
host3 has (vm3)
Randomly on a daily basis any or all of the hosts come up with that error, but they resolve after awhile as stated. Some days none of them
get this error and other days 1,2 or 3 get the error.
I don't see any performance issues in the logs. I have 2 nics on vmotion\management in an active active configuration on all hosts. I am todl to
put the nics in active active so can't really change that. But I don't think it is a network issue, although it should be. If this were a network issue
would host1 ever lose heartbeat?
Can anyone tell me something about this error? Perhaps find a way to stop it from occuring? I reset my vmonitoring to a lower level so that the
vm's doing keep shutting down and restarting. So that helps, but would like to know why it occurs.
Which version of vCenter are you using? Which logs on the host did you look at?
Vcenter 4 Standard, although the install package says 4.1.
I am looking under vcenter management events
5/7/2012 6:13:54 AM
us.dc-ups.com
HA agent on us.dc-ups.com in cluster
Bur HA DRS cluster in Bur Data Center has an
error: HA agent on the host failed
error
5/7/2012 6:14:25 AM
us.dc-ups.com
HA Agent on host us.dc-ups.com in
cluster Bur HA DRS cluster in datacenter Bur Data
Center is healthy
info
Typically in these cases a manual uninstall helps solving the problem:
I think I got it to work, I edited the etc\host file and made sure each host in the HA had the proper IP mapped to the hostname. It was pretty messed up in there. It was missing entries and some entries were wrong. It looks like vmware doesn't edit that file properly when changes are made.
Have gone two days without any errors.
Had one question, should it be pooling those hosts so much in that event log (check notification of datacenter), says things like ha agent name is cluster ha is running. It does this an awful lot multiply times an hour it looks like, I guess it's normal.
Will resolve this post in a few days if all is good.
DNS is always critical indeed 🙂
This issue is resolved.
Yeah it has a working dns but I guess at times it losses connection to it. After adjusting the host file it now does not disconnect from HA.