VMware Cloud Community
gunga
Contributor
Contributor

HA agent on host1 in cluster one has an error:HA agent on host failed

Hello,

HA agent on host1 in cluster one has an error:HA agent on host failed

I receive the error in the subject line above. I checked the logs and the agent starts running again randomly. Sometimes in 3 seconds and

sometimes in just less then 2 minutes. 

What I have is 3 hosts all using the same hardware

host1 has (vm1 and a virtual Vcenter running)

host2 has (vm2)

host3 has (vm3)

Randomly on a daily basis any or all of the hosts come up with that error, but they resolve after awhile as stated. Some days none of them

get this error and other days 1,2 or 3 get the error.

I don't see any performance issues in the logs. I have 2 nics on vmotion\management in an active active configuration on all hosts. I am todl to

put the nics in active active so can't really change that. But I don't think it is a network issue, although it should be. If this were a network issue

would host1 ever lose heartbeat?

Can anyone tell me something about this error? Perhaps find a way to stop it from occuring? I reset my vmonitoring to a lower level so that  the

vm's doing keep shutting down and restarting. So that helps, but would like to know why it occurs.

0 Kudos
6 Replies
admin
Immortal
Immortal

Which version of vCenter are you using? Which logs on the host did you look at?

0 Kudos
gunga
Contributor
Contributor

Vcenter 4 Standard, although the install package says 4.1.

I am looking under vcenter management events

5/7/2012 6:13:54 AM
us.dc-ups.com

HA agent on us.dc-ups.com in cluster
Bur HA DRS cluster in Bur Data Center has an
error:  HA agent on the host failed
error

5/7/2012 6:14:25 AM
us.dc-ups.com

HA Agent on host us.dc-ups.com in
cluster Bur HA DRS cluster in datacenter Bur Data
Center is healthy
info

0 Kudos
depping
Leadership
Leadership

Typically in these cases a manual uninstall helps solving the problem:

  • Disable HA on the cluster
  • /opt/vmware/aam/VMware-aam-ha-uninstall.sh
  • /sbin/services.sh restart
  • Enable HA on the cluster
0 Kudos
gunga
Contributor
Contributor

I think I got it to work, I edited the etc\host file and made sure each host in the HA had the proper IP mapped to the hostname. It was pretty messed up in there. It was missing entries and some entries were wrong. It looks like vmware doesn't edit that file properly when changes are made.

Have gone two days without any errors.

Had one question, should it be pooling those hosts so much in that event log (check notification of datacenter), says things like ha agent name is cluster ha is running. It does this an awful lot multiply times an hour it looks like, I guess it's normal.

Will resolve this post in a few days if all is good.

0 Kudos
depping
Leadership
Leadership

DNS is always critical indeed 🙂

0 Kudos
gunga
Contributor
Contributor

This issue is resolved.

Yeah it has a working dns but I guess at times it losses connection to it. After adjusting the host file it now does not disconnect from HA.

0 Kudos