Hello folks,
We've got a cluster with 4 hosts. One of them was rebooted for maintenance.
After it came up this host is showing HA agent error every 15 minutes aproximately.
I've been reading a little about this issue.
I've tried to "Reconfigure for HA" it resolves the problem but it came back.
Servers are configured with NTPd - time is correctly set.
/etc/hosts has both fqdn and short name (short names were added by me). - I can ping the failing server even when the HA Agent error is showing up.
What's the next step?
How are your hosts named? all lower case? Is it the same in DNS? if they are different, make them all lowercase for both the host and the DNS entries.
Try disabling HA on the cluster, wait for the HA unconfig tasks to complete, and then re-enable HA on the cluster.
Issue has been resolved.
I'm not sure how, but after a Reconfiguration for HA, the issue showed up one more time, but at the same time a DRS VM Migration happened.
After that the issue has been resolved.
Please have in mind that I have previously ran the Reconfiguration for HA at least 2 times before this one.
Do you have portfast enabled on your physical switch? Or if you are running trunked ports do you have portfast trunk enabled? I was seeing intermittent HA Agent errors in my environment. I'm running trucked ports so I needed to enable portfast trunk. Once I did that I my HA Agent issues went away. It may be worth a look.
Thanks, I will take a look into that.
After a maintenance (enter maintenance mode) the error appeared again
I've already tried to disable and re-enable HA on the cluster but after that every 15 minutes I see the error.
Hi
I had a similar error. Issue was sorted after checking and rechecking the DNS servers and the "A" and "PTR" records.
All working fine now...
One of the hosts PTR wasn't added.
I've added it to DNS, I hope not to see that error again.
Cross fingers
Just update or create your PTR records for all ESX servers.
Good Luck
Walter