VMware Cloud Community
cayates
Contributor
Contributor
Jump to solution

HA Error on one out of three hosts

Hello,

I setup a cluster with three servers and enabled HA. Two servers work fine, however, one of the servers (esx1) has an error saying "HA agent on esx1 in cluster CMC in CMC has an error". I checked the logs and it looks like esx1 determines that esx3 is the primary agent but it is unable to connect to it so it determines that there is no agent running on esx3.

I made sure I could ping esx3 from esx1 and I can. I also made sure it would resolve with the short name and the FQDN and it does. Not sure why it can't connect....any ideas? The log is attached.

0 Kudos
1 Solution

Accepted Solutions
athlon_crazy
Virtuoso
Virtuoso
Jump to solution

When fqdn has been configured correctly, sometimes I just simply uninstall vpxa from problematic ESX hosts, disconnect & remove then add it back to your cluster.

VMware newbie..

Zen Systems Sdn Bhd

www.no-x.org

http://www.no-x.org

View solution in original post

0 Kudos
7 Replies
Troy_Clavell
Immortal
Immortal
Jump to solution

are you using resource pools other than your DRS pool? If not, try removing the host from vCenter and then adding it back in.

check /etc/opt/vmware/aam/FT_HOSTS to see if all nodes are listed correctly as well.

NWhiley
Enthusiast
Enthusiast
Jump to solution

It's worth double checking /etc/hosts and /etc/resolv.conf to make sure all nodes have the same settings.

Also, looks like you are having trouble with port 8042, might be worth a quick squint over the firewall settings on each host.

Neil VCP
0 Kudos
weinstein5
Immortal
Immortal
Jump to solution

to add a common error I have scene is the host name is mistypedwhne configuring the DNS information for your ESX hostss -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
athlon_crazy
Virtuoso
Virtuoso
Jump to solution

When fqdn has been configured correctly, sometimes I just simply uninstall vpxa from problematic ESX hosts, disconnect & remove then add it back to your cluster.

VMware newbie..

Zen Systems Sdn Bhd

www.no-x.org

http://www.no-x.org
0 Kudos
cayates
Contributor
Contributor
Jump to solution

@Troy: Yes, I am using two resource pools, one for production and one for development vms. FT_HOSTS does not list esx3, but it does list esx2 correctly

@NWhiley and @weistein5: I checked that the name server is set correctly and the DNS entries on the domain servers. All are correct. I checked that each server can resolve and ping each other as well.

There should not be any firewall blocking communications between the two servers, however, I will double check this as well. I will also try your suggestion athlon_crazy's suggestion as well.

0 Kudos
Troy_Clavell
Immortal
Immortal
Jump to solution

have you tried to just right click on the host in question and choose reconfigure for HA?

0 Kudos
cayates
Contributor
Contributor
Jump to solution

@Troy: Yes, I have tried that as well and it does not fix the error.

@athlon_crazy: Your suggestion fixed the problem, thank you.

For future reference, here is what I did which fixed the error.

1. Disconnect and remove the host from the cluster and vCenter

2. Run following commands:

/etc/init.d/mgmt-vmware stop

/etc/init.d/vmware-vpxa stop

rpm -qa | grep vpxa

rpm -e VMware-vpxa-2.5.0-147633

/etc/init.d/mgmt-vmware start

3. Reconnect host and add to cluster and no errors!

Strange, but all I care about is it worked. Thanks.

0 Kudos