VMware Cloud Community
jonathanp
Expert
Expert
Jump to solution

HA agent has an error

Hello,

I know there is another topic that talk about this, but I don't found this error on any other topic.

So, on a 4 hosts HA/DRS cluster, we have random "HA agent has an error"

Random in the sens that if I "un-configure" HA and reconfigure the error happen on another host.

So far I verified the DNS, even add other hosts ips in the /etc/hosts file.

We will set an NTP server but the servers are for now pretty much set at the same time.

in one of the log file that is : /opt/LGTOamm512/log/agent/autoRecover.log

I have a line that state : "Backbone has failed. Retarting agent."

and agent starts sucessfully... but this error happen every 15 minutes.

Someone has ever see this?

Regards

Jon

0 Kudos
1 Solution

Accepted Solutions
TCronin
Expert
Expert
Jump to solution

Do all 4 hosts match exactly?

Same CPU

Same EXACT SAN volumes including names and no extras on any of the hosts

Same EXACT network/port group names and again no extras on any host

Tom Cronin, VCP, VMware vExpert 2009 - 2021, Co-Leader Buffalo, NY VMUG

View solution in original post

0 Kudos
5 Replies
TCronin
Expert
Expert
Jump to solution

Do all 4 hosts match exactly?

Same CPU

Same EXACT SAN volumes including names and no extras on any of the hosts

Same EXACT network/port group names and again no extras on any host

Tom Cronin, VCP, VMware vExpert 2009 - 2021, Co-Leader Buffalo, NY VMUG
0 Kudos
jonathanp
Expert
Expert
Jump to solution

Yes all the same CPU event exactly all the same hardware.

They have exact same lun/names too

They have all the same network config excent one vSwitch that have one portgroup added to use with only one vm that is used for test.

Could it be just this?

Regards

Jon

0 Kudos
olegarr
Enthusiast
Enthusiast
Jump to solution

I had the same issue before... Actually I had it twice; last time I had it last weekend after I shutdown all my esx hosts for SAN upgrade (the cluster with HA/DRS was working just fine before shutdown).

The only way to fix it was to create new cluster and move all hosts into it.

So, if you know that you have correct DNS settings (or even better if it was working before) just try to create new cluster, move host there and configure it for HA/DRS.

Best regards,

olegarr

TCronin
Expert
Expert
Jump to solution

Add that vswitch to the other 3 hosts, or temporarily remove it from the one host. I've had small diffrences like that cause HA errors in my test lab. Usually a VM has to be using the odd resource to cause the error.

Tom Cronin, VCP, VMware vExpert 2009 - 2021, Co-Leader Buffalo, NY VMUG
jonathanp
Expert
Expert
Jump to solution

Actually the esx hosts was not pointing to the good DNS servers.

Regards and thanks for your help

0 Kudos