VMware Cloud Community
edulin
Contributor
Contributor

Ha Agent has an error

Hello folks,

We've got a cluster with 4 hosts. One of them was rebooted for maintenance.

After it came up this host is showing HA agent error every 15 minutes aproximately.

I've been reading a little about this issue.

I've tried to "Reconfigure for HA" it resolves the problem but it came back.

Servers are configured with NTPd - time is correctly set.

/etc/hosts has both fqdn and short name (short names were added by me). - I can ping the failing server even when the HA Agent error is showing up.

What's the next step?

0 Kudos
8 Replies
jf2000
Contributor
Contributor

How are your hosts named? all lower case? Is it the same in DNS? if they are different, make them all lowercase for both the host and the DNS entries.

0 Kudos
admin
Immortal
Immortal

Try disabling HA on the cluster, wait for the HA unconfig tasks to complete, and then re-enable HA on the cluster.

edulin
Contributor
Contributor

Issue has been resolved.

I'm not sure how, but after a Reconfiguration for HA, the issue showed up one more time, but at the same time a DRS VM Migration happened.

After that the issue has been resolved.

Please have in mind that I have previously ran the Reconfiguration for HA at least 2 times before this one.

0 Kudos
rossb2b
Hot Shot
Hot Shot

Do you have portfast enabled on your physical switch? Or if you are running trunked ports do you have portfast trunk enabled? I was seeing intermittent HA Agent errors in my environment. I'm running trucked ports so I needed to enable portfast trunk. Once I did that I my HA Agent issues went away. It may be worth a look.

0 Kudos
edulin
Contributor
Contributor

Thanks, I will take a look into that.

After a maintenance (enter maintenance mode) the error appeared again

I've already tried to disable and re-enable HA on the cluster but after that every 15 minutes I see the error.

0 Kudos
Stewart_Hyde
Contributor
Contributor

Hi

I had a similar error. Issue was sorted after checking and rechecking the DNS servers and the "A" and "PTR" records.

All working fine now...

edulin
Contributor
Contributor

One of the hosts PTR wasn't added.

I've added it to DNS, I hope not to see that error again.

Cross fingers

0 Kudos
wgonzvega
Contributor
Contributor

Just update or create your PTR records for all ESX servers.

Good Luck

Walter

0 Kudos