VMware Cloud Community
anttijf
Contributor
Contributor
Jump to solution

Upgrade VC 2.5 Update1 to 2.5 Update 2 - HA Agent not working on cluster

Hello,

I have upgrade my VC 2.5U1 to 2.5U2, and it works fine exept HA on cluster.

All of the servers in the VI3 Cluster have a HA Agent error. 2 of them reporting error in HA agent and one reporting that agent is disabled. I chose the "Reconfigure HA" option and that didn't help. I have tried all the tricks that I can find. Everything seems to be right (DNS, network, etc). I have even tried to create new cluster wirh HA and reinstalling agent. No luck...

In cluster level there is error: "Unable to contact a primary HA Agent in cluster XXX in XXX"

Any Ideas?

(ESX version on all hosts is esx 3.5.0 (103908))

Reply
0 Kudos
1 Solution

Accepted Solutions
kjb007
Immortal
Immortal
Jump to solution

That file is under /etc/opt/vmware/aam/FT_HOSTS.

Run hostname on the service console, and see if the hostname matches what DNS is pointing to.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

View solution in original post

Reply
0 Kudos
17 Replies
depping
Leadership
Leadership
Jump to solution

compare the DNS name in the VIC with the name and ip in /etc/hosts , they need to be exactly the same including capitals. Also check the contents of /etc/FT_HOSTS if that's not correct just delete the file and enable HA again.

Duncan

My virtualisation blog:

If you find this information useful, please award points for "correct" or "helpful".

Reply
0 Kudos
anttijf
Contributor
Contributor
Jump to solution

Thanks for quick reply!

DNS (nameresolution) is working fine from all hosts and VC -machine. There hasn't been any changes on network side and HA was working like charm with older versions of VC and ESX.

I am quite sure that this is all to do with update....

You mentioned this /etc/FT_HOSTS -file, I can't find it anywhere....

Reply
0 Kudos
kjb007
Immortal
Immortal
Jump to solution

That file is under /etc/opt/vmware/aam/FT_HOSTS.

Run hostname on the service console, and see if the hostname matches what DNS is pointing to.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
BigHug
Enthusiast
Enthusiast
Jump to solution

I got the same problem when upgrade from 2.5 to 2.5U1. The HA configure seems got mess-up. have you tried to rename the cluster, disable HA, rename cluster name back and enable HA. It did the trick for me.

Reply
0 Kudos
anttijf
Contributor
Contributor
Jump to solution

I have checked hostname and it's correct exept that host part in FQDN is on capital letters and in DNS it is all lowercase.

FT_HOSTS file shows only hostnames without domain, all server names are lowercase.

Could this be the reason?! Sounds odd because HA was working fine with Update 1.

I have tried to rename cluster and even creating new but that didn't help...

Reply
0 Kudos
kjb007
Immortal
Immortal
Jump to solution

There have been other reports where this is causing an issue. I would delete the FT_HOSTS file, and reconfigure for ha from the vc. Make sure as Duncan stated above that the names match up in letter case.

-KjB

Message was edited by: kjb007 : added

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
anttijf
Contributor
Contributor
Jump to solution

Finally I got this solved! Now HA is working again!!!

HA was working on old version, but it stopped working after upgrade.

Solution was so simple that this didn't first come to my mind. On default installation hostname of ESX host is on capital letters and rest of FQDN is lowercase. (HOSTNAME.domain.com.) I changed the hostname from VI console to lowercase and after that, HA was working again....

Thanks for everyone for help!

Reply
0 Kudos
vmware4u
Contributor
Contributor
Jump to solution

By disabling the VmwareHA...then re-enable the HA resolved our problem

Regards,

Marshawn

Reply
0 Kudos
michaelb23
Contributor
Contributor
Jump to solution

I also had the same error after upgrading to 2.5 Update 2. The HA was showing RED on the entire cluster. I corrected the issue by turning off HA and re-enabling it.

Mike

Reply
0 Kudos
joergriether
Hot Shot
Hot Shot
Jump to solution

dont halloo till your out of the wood, i also re enabled ha yesterday, renamed the cluster, moved em back and in again and it worked. now today i opened vc and it again showed me the cute red sign next to two of my esx hosts saying ha agent has an error. i can´t believe these kinda problems after an regular update regarding such a mission critical component !!!! vmware has to fix this issue asap!

my two cents...

Joerg

Reply
0 Kudos
joergriether
Hot Shot
Hot Shot
Jump to solution

Finally seemed to solve it, the cluster name seemed to be the source of the problem, i wrote all what i did to my blog:

best regards

Joerg

Reply
0 Kudos
KyawH
Enthusiast
Enthusiast
Jump to solution

I had the similar problems. HA were disabled on 2 ESX servers (no red or yellow icons-only showed errors in summary tab) and 1 has an error (red icon) Tried to fix according to the posts here and somewhere eles. Called VMware support and let him fix what he thought. I spent 2 days by myself and many hours with VMware support. Nothing helped fixed. Finally I decided to reinstall the host with the latest U2 image, reconfigured everything and reinstalled every piece of agents/software etc. After spending 2 hard hours on each host, everything's fine. It is definitely a bug on U2 I guess.

Reply
0 Kudos
Traincow
Contributor
Contributor
Jump to solution

After 2 days of searching and testing i finally fixed this issue. Recreating the cluster didn't help me, DNS looked perfect and after reading the following release notes i was sure /etc/hosts didn't need to be edited (everything was lowercase on all hosts and in VI anyway).

"DNS Resolution Is No Longer a Requirement to Enable VMware High Availability on ESX Server Hosts

Previously, enabling VMware High Availability required DNS resolution of all ESX Server hosts in a High Availability cluster. This was done using configuring DNS records or by adding all of the host names and IP addresses to the /etc/hosts file on each server."

All my hosts had solid DNS by FQDN and host but when i came to enable HA i was getting the "Unable to contact a primary HA Agent in cluster XXX in XXX" error. I took a chance and edited /etc/hosts adding the following.

Before:

127.0.0.1 localhost.localdomain localhost

192.168.0.1 esx1.eu.domain.net

After:

127.0.0.1 localhost.localdomain localhost

192.168.0.1 esx1.eu.domain.net esx1

After rebooting and enabling HA it came up.

Reply
0 Kudos
sheetsb
Enthusiast
Enthusiast
Jump to solution

I was very interested to see this post. I've tried everything here as well and also have a case open with support. I too found the only way to fix some of my problematic ESX hosts was a clean install of Update 2. I have three systems I've left untouched to see if support can find the problem. I've had the case open for four days without any success. Next week I'll probably just rebuild the remaining hosts since it seems to be a clean solution.

Bill S.

Reply
0 Kudos
DGWS
Contributor
Contributor
Jump to solution

Hi

I'm also having the same problem, I've checked and updated all the /etc/hosts files and /etc/sysconfig/network and /etc/vmware/esx/conf and rebooted all the host sill no go

this all started after i upgraded VC to 2.5U2. The ESX hosts have not been updated as yet.

I've tried disabling and reenabling HA and still no go any help would be apreciated.

Thanks

Reply
0 Kudos
Markus_Jaekel
Contributor
Contributor
Jump to solution

Hi All,

had have the same problem after the upgrade. The isssue was that on one node the hostname was in uppercase while in /etc/hosts it was written in lowercase. After changing the hostname to lowercase all works fine.

Many thanks to all you who put there comments in here.

Bye

Markus

Reply
0 Kudos
dipcas
Contributor
Contributor
Jump to solution

I have the same problem in VC2.5 Update3 (I update from VC2.0 and all seems work fine for a month, except HA agents turned very sensible). From last week, I always have two nodes (3 nodes cluster) with HA agent error. Only after reboot these nodes, and disabling and reenabling HA in cluster, the cluster comes OK again... but after less than one day, I return to same situation, HA agents in nodes 1 and 2 has error. Service Request open for a two times, but not luck up to now.

Reply
0 Kudos