VMware Cloud Community
kreischl
Enthusiast
Enthusiast

HA failure - HA thinks an old server is mater agent (vmware-sites file)

I recently moved from one to two clusters and the original cluster was having problems with its HA. After searching the discussion forms I think it has to do with the vmware-sites file under /opt/LGTO../config folder. It is showing a server that is no longer part of the cluster.

I also think it explains why this appears on the main cluster page:

Unable to contact a primary HA agent in cluster

I've tried

1) Disable/Re-enable HA for the entire cluster

2) Put the server it shows in the vmware-sites file into maintenance mode and disable/renable HA on the cluster and one of the servers in the cluster

If you disable HA for the cluster then the vmware-sites file disappears. As soon as I enable it the file reappears and shows that one server that is no longer in that cluster.

Where is it pulling the entry when the file is re-created?

Any suggestions?

The official error on each host is:

internalerror /opt/LGTOaam512/bin/ft_startup failed

This is the vmware-sites in the working/new cluster:

FULLTIME_SITES_TID 00000007

+ 1:8042,8042,8043 server01 vmware #FT_Agent_Port=8045

+ 2:8042,8042,8043 server03 vmware

+ 3:8042,8042,8043 server04 vmware

\- 4:8042,8042,8043 server05 vmware

+ 5:8042,8042,8043 server11 vmware

+ 6:8042,8042,8043 server12 vmware

This is the vmware-sites in the original/non-working cluster after I enable HA at the cluster or host level.

This is the vmware-sites in the working/new cluster:

FULLTIME_SITES_TID 00000007

+ 1:8042,8042,8043 server05 vmware #FT_Agent_Port=8045

Message was edited by:

kreischl

0 Kudos
5 Replies
kreischl
Enthusiast
Enthusiast

I fixed the problem.

Maybe I can explain it better

OLDCLUSTER was showing in the vmware-sites file server05 as the master agent even though I moved server05 to NEWCLUSTER.

Even after disabling HA for OLDCLUSTER, HA on OLDCLUSTER it couldn't figure that out.

I would call this a bug? Smiley Happy

The fix:

1) disable HA in OLDCLUSTER

2) put server05 in NEWCLUSTER into maintenance mode.

3) move server05 to OLDCLUSTER

4) enable HA in OLDCLUSTER (server05 is still in maint mode)

5) Yell "hooray" because it works

6) move server05 back to NEWCLUSTER

7) take server05 out of maintenance mode

0 Kudos
ITQPG
Contributor
Contributor

What if the server that is in the vmware-sites file doesnot exist anymore ?

Smiley Sad

0 Kudos
vmwaredimetroni
Contributor
Contributor

I´ve found the same problem after update our virtual center 2.5 to 2.5 update 1.

I´ve fixed the issue with these steps:

1. Rename the cluster.

2. Disable HA in new cluster name.

3. Enable again the HA in new cluster.

We have with good luck with this, with these simple steps error message dissapear.

0 Kudos
Duca
Contributor
Contributor

Hi!

We had the exact issue at one of our customers site, as vmwaredimetronic descripe (after upgrading the VC from 2.5 to 2.5 update 1) and the trick with renaming the cluster, then disable and enable the HA on the cluster solved the issue.

so thanks for the magic words vmwaredimetronic 🐵

0 Kudos
jtubman
Contributor
Contributor

Thanks VMwaredimetro...

I just received the same error "unable to contact a primary ha agent" after installing update 1 on vsphere 4.0. I renamed my cluster, disabled HA, waited for it to complete and then enabled.

0 Kudos