VMware Cloud Community
gwatson
Contributor
Contributor

An error occurred during configuration of the HA Agent on the host

Hi guys,

Having a problem with one of our 3 hosts. The host says it's "disconnected". The virtual servers which are running on it say they are disconnected, however they are still physically working as servers.

I have now put the server into maintenance mode. After rebooting and bringing back into the Cluster I get the following message when trying to enable HA - "An error occurred during configuration of the HA Agent on the host"

The cluster is set up for HA. However if select the properties of the other two hosts it does give me the option to "configure for HA" which is strange. Any ideas?

After reading some other posts I checked DNS which seems fine. I can ping the host server from the others, not using the FQDN. I can ping the gateway no problem.

I should have said the server has worked in HA, but now Smiley Sad

G

null

0 Kudos
23 Replies
sbeaver
Leadership
Leadership

First question are there any VM running on the host that you rebooted currently?

Steve Beaver
VMware Communities User Moderator
VMware vExpert 2009 - 2020
VMware NSX vExpert - 2019 - 2020
====
Co-Author of "VMware ESX Essentials in the Virtual Data Center"
(ISBN:1420070274) from Auerbach
Come check out my blog: [www.virtualizationpractice.com/blog|http://www.virtualizationpractice.com/blog/]
Come follow me on twitter http://www.twitter.com/sbeaver

**The Cloud is a journey, not a project.**
0 Kudos
gwatson
Contributor
Contributor

No, it was in maintenance mode with all servers moved off it.

0 Kudos
sbeaver
Leadership
Leadership

From the shell on that server from this command

service mgmt-vmware restart

See if this will let you connect again

Steve Beaver
VMware Communities User Moderator
VMware vExpert 2009 - 2020
VMware NSX vExpert - 2019 - 2020
====
Co-Author of "VMware ESX Essentials in the Virtual Data Center"
(ISBN:1420070274) from Auerbach
Come check out my blog: [www.virtualizationpractice.com/blog|http://www.virtualizationpractice.com/blog/]
Come follow me on twitter http://www.twitter.com/sbeaver

**The Cloud is a journey, not a project.**
0 Kudos
gwatson
Contributor
Contributor

Thanks for the replies. Problem seems to have been down to an old WINS entry hanging around for the Virtual Centre.

G

0 Kudos
johu
Contributor
Contributor

My problem was "an error occurred during configuration of the HA Agent on the host" with additional error "Vmap_foobarvm02 process failed to stop".

This started occuring after I added third ESX server (foobarvm03), moved VM's to new server and un-installed two old servers. Next I replaced old servers with new hardware and installed ESX on them using same names that old servers had. Everything went fine until I tried to add two new servers to HA setup and I just got those errors. Switch configs, DNS, etc. were all fine so solutions that helped others didn't help me. After some poking around on SC I figured out what Legato AAM is trying to do. Some tips below.

Check /opt/LGTOaam512/log for errors, especially aam_config_util_addnode.log. If it's filled with "Error \[10022]: Process Not Found" and last correct error before them is "Error \[10001]: Instance Already Exists" problem is likely same that I had.

It seems that when you install new ESX server using same name as old one it's going to fail. Could be that I removed old servers incorrectly too. VMware tools did remove accounts for old nodes but there was some other junk left on Legato AAM side. Here's how you can re-add those logins and only then VMwares HA configuration knows what to do. It still takes longer than usually for HA config to go initially thru, but it does work after this.

Enter these on working node (foobarvm03):

perl /opt/LGTOaam512/vmware/aam_config_util.pl -cmd=listnodes

\- should say problematic nodes are down

FT_DIR=/opt/LGTOaam512 /opt/LGTOaam512/bin/ftcli -d vmware

\- You get 'AAM>' prompt. Type:

deleteUser root foobarvm01

deleteUser root foobarvm02

createUser root foobarvm01 Node PERM_ALL

createUser root foobarvm02 Node PERM_ALL

Check userlist. Should have admin account for all nodes now.

listUsers

Attempt logon from broken node (foobarvm02):

FT_DIR=/opt/LGTOaam512 /opt/LGTOaam512/bin/ftcli -d vmware

No errors? Does it work?

listUsers

Good. Now just retry HA setup using VirtualCenter.

0 Kudos
davidbarclay
Virtuoso
Virtuoso

WINS? You think so? HA problems are usually dns resolution on the ESX nodes...so definitely no WINs.

Anyway, glad it's working.

Dave

0 Kudos
DeeJay
Enthusiast
Enthusiast

They may well be. However, what if the ESX hosts are pointing at Windows DNS, which is using WINS as a last resort for resolving names?

http://www.microsoft.com/technet/archive/winntas/deploy/integrat.mspx?mfr=true

0 Kudos
Gazza1
Contributor
Contributor

Make sure that your host file has the correct hostname and ip address of the server you are getting the error.

We had a server that had been built with the wrong IP, this was changed but the local hosts file was not update.

Change this and took out of maintenance mode and HA Agent started with no issues.

0 Kudos
Oli_L
Enthusiast
Enthusiast

I had this same problem

Make sure that you configure the host file on every host and include all your host names inthe format of IP address FQDN SHORTNAME. Do not rely on DNS to resolve your shortnames - this was recommended to me by vmware support. So... edit the following file

/etc/hosts

Make sure you enter the ipaddress (space) FQDN (space) shortname

ie

192.168.10.1 esx1.suffix.com esx1

192.168.10.2 esx2.suffix.com esx2

and so on for every host in every host file

The shortname was the key for me...

0 Kudos
drbbton
Contributor
Contributor

I had the same problem, it ended up being a case sensitivity problem with VMWare. In the Vmware DNS configuration in VC, the server name had caps in it, while our dns records obviously did not. Changing to all lower case solved the problem. That is another thing for those of us who despise host files.

0 Kudos
abraju
Contributor
Contributor

Hi Johu,

thanks for your trouble shooting procedure. and this helped me in solving my HA error problem.

abraju

0 Kudos
korman
Contributor
Contributor

I too just had this problem..

I changed the IP address from the console using esxcfg-vswif and the /etc/hosts had the old ip

0 Kudos
Fdubo
Contributor
Contributor

hi,

i've got the same error since a few days and i tried all the things but nothing seems to be working.

must i restart vmware daemon each time i make a dns change ? or must i reboot the machine ?

can anyone tell me how to restart vmware daemons each time i make a dns modification ?

thanks

0 Kudos
bulldude
Contributor
Contributor

I just got bit by this and I have setup many HA/DRS clusters.

DNS was ok, but HA wouldn't stay configured. Pinging ip, short name and fqdn from each esx host's console gave all the expected results.

It was one /etc/hosts file. An esx host was renamed but the /etc/hosts file had the old entry.

Made the change and all was ok. If DNS (even Microsoft) has the correct a record and you have a ptr record I have never had any issues.

0 Kudos
Rajesh_interout
Contributor
Contributor

Error : An error occurred during configuration of the HA Agent on the host

Solution : Try these 4 steps one by one

1) Check the DNS is working correctly and using your DNS Server you can resolve each ESX host by name.

2) If No DNS Server in place edit the /etc/hosts file and add entry for each ESX host.NOTE : Make sure you add FQDN and also the shortname.You can obtain the shortname by typing hostname -s.

3) Disconnect the ESX host from virtual center and connect them back.

4) Disbale VMware HA feature from Cluster and enable them back.

You will find that one of this steps will solve your problem.

Raj

0 Kudos
R0v3r
Contributor
Contributor

I have a question about the DNS portion. Does this still apply to 3i 3.5? Is there still a hosts file on this version that we can look at?

0 Kudos
Rajesh_interout
Contributor
Contributor

Host file :

/etc/hosts

Also look at : /etc/resolv.conf for DNS..

0 Kudos
R0v3r
Contributor
Contributor

Thanks for the reply. I thought Linux was not on the new version that is why we use the Remote CLI. How does one get to the ESX Host local hosts file then?

0 Kudos
R0v3r
Contributor
Contributor

I think I found a way to see the hosts file and the rest of the files as well on a 3i 3.5 host. In the VI client, open the Administration menu and choose Export Diagostic Data. Select the hosts whose logs you want and the destination directoy. Then use something like 7Zip to extract the tgz file that is created. The folder structure of the host is then accessable along with all of the files like the hosts file, resolv.conf etc.

0 Kudos