VMware Cloud Community
caddo
Enthusiast
Enthusiast
Jump to solution

cmd addnode failed for secondary node: Internal AAM Error - agent could not start.: Unknown HA error

I'm testing upgrade paths to vSphere from ESX 3.5 update 4 on a IBM BladeCenter with 2 hs21 xm blade servers; i'm going through several problems and the last one is the one mentioned in the subject of this thread.

In this scenario i upgraded vCenter succesfully then i took all VMs on a single esx 3.5 host, i removed the other host from the cluster and then from vCenter, then i made a fresh install of vSphere, i reconnected the host to the cluster then i did the whole procedure again with the second node. At the end i have two hosts with vSphere installed but i had to disable HA in my cluster since i always get this error when i try to configure HA agents on the hosts, but i have to say that DRS works ok.

In the release notes of vSphere, in the known issues section i can read:

"Upgrading from an ESX/ESXi 3.x host to an ESX/ESXi 4.0 host results in a successful upgrade, but VMware HA reconfiguration might fail

When you use vCenter Update Manager 4.0 to upgrade an ESX/ESXi 3.x host to ESX/ESXi 4.0, if the host is part of an HA or DRS cluster, the upgrade succeeds and the host is reconnected to vCenter Server, but HA reconfiguration might fail. The following error message displays on the host Summary tab: HA agent has an error : cmd addnode failed for primary node: Internal AAM Error - agent could not start. : Unknown HA error .

Workaround: Manually reconfigure HA by right-clicking the host and selecting Reconfigure for VMware HA."

The problem is that this workaround doesn't work for me, so i was wondering if someone, once again is able to help me with this issue.

Thanks in advance for your support.

1 Solution

Accepted Solutions
Mirko_Huth
Enthusiast
Enthusiast
Jump to solution

Hi Caddo,

i had the same issue with one of my hosts. It was resolved after disabling HA on the cluster and enabling it again.

Mirko

View solution in original post

29 Replies
Remnarc
Contributor
Contributor
Jump to solution

I am having this exact problem with 3 R900's running ESX4 with an FC-SAN and vCenter. I have everything upgraded to 4.0 including the tools on the VM's themselves.

After a collective of 8 hours on the phone with DELL VMware support they have manged to come up with following (it did not help for us.)

A. From within VC

1.Remove the servers

2.Remove the cluster

3.Create a new cluster

4.Add the hosts

If that does not work try.

B. Run the following command as root in the console of each of the hosts.

/opt/vmware/aam/bin/VMware-aam-ha-uninstall.sh

Try to remove and add again in to the cluster.

caddo
Enthusiast
Enthusiast
Jump to solution

I have already tried all that you suggested with no luck, eventually i reinstalled both ESX hosts and HA is now working just fine.

Thanks anyway, i hope this will help someone else.

Reply
0 Kudos
Mirko_Huth
Enthusiast
Enthusiast
Jump to solution

Hi Caddo,

i had the same issue with one of my hosts. It was resolved after disabling HA on the cluster and enabling it again.

Mirko

Remnarc
Contributor
Contributor
Jump to solution

This is what my file /etc/hosts looked like after a disc installation of ESX4.

127.0.0.1 localhost

::1 localhost

172.114.20.133 HOST3.ORG.DS.COM

Notice the DNS name is not appeneded to the end of the line. It should look like.

172.114.20.133 HOST3.ORG.DS.COM HOST3

172.114.20.35 HOSTVMVIC1.ORG.DS.COM HOSTVMVIC1

172.114.20.131 ESXHOST1.ORG.DS.COM ESXHOST1

172.114.20.132 ESXHOST2.ORG.DS.COM ESXHOST2

This is a host file from a machine that I upgraded using the vCenter Update Manager to ESX4.

  1. Do not remove the following line, or various programs

  2. that require network functionality will fail.

127.0.0.1 localhost.localdomain localhost

172.114.20.131 ESXHOST1.ORG.DS.COM

172.114.20.132 ESXHOST2.ORG.DS.COM ESXHOST2

172.114.20.133 ESXHOST3.ORG.DS.COM ESXHOST3

172.114.20.35 HOSTVMVIC1.ORG.DS.COM HOSTVMVIC1

Again you will notice that the DNS name is not appended to the first line.

It should look like this.

172.114.20.131 ESXHOST1.ORG.DS.COM ESXHOST1

Once I fixed those files all the systems came up in HA.

Reply
0 Kudos
mvarre
Contributor
Contributor
Jump to solution

disable HA and DRS. then enable ONLY HA by itself, it should go through without a problem, then enable DRS on its own after HA is on.

Reply
0 Kudos
Remnarc
Contributor
Contributor
Jump to solution

Neither of those options worked.

It all came back to the host files of the servers.

Reply
0 Kudos
Mirko_Huth
Enthusiast
Enthusiast
Jump to solution

I would try to disable / enable HA on the cluster first. That worked for me, too.

Disadvantage of disabling DRS would be, that he will loses his Ressource pool config. Therefore i would only do that if it is not enough to disable / enable HA on the cluster.

Reply
0 Kudos
adeelleo
Contributor
Contributor
Jump to solution

Way to go 'Mvarre"

You made my day.

I had totally run out of options, but your solution of enabling only HA and then DRS solved the problem. Smiley Happy

Best regards,

Adeel Akram

Reply
0 Kudos
elgordojimenez
Contributor
Contributor
Jump to solution

Hello,

Host entries worked for us, added the short name at the end like this:

esx.domain.com esx

added on both servers and it worked.

        • If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful ****

**** If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful ****
Reply
0 Kudos
adeelleo
Contributor
Contributor
Jump to solution

Well in my case this initially solved my problem. But this time i had all entries in my host file correct and still could not resolve the issue. So enabling HA and DRS one by one solved the issue for me.

Wierd things happen with HA in ESX. Smiley Happy

Reply
0 Kudos
mcscotty
Enthusiast
Enthusiast
Jump to solution

Oddly, I encountered this problem not after upgrading from ESX 3.5 to 4.0, but after applying the first round of patches that took me up to build 175625. The first host I patched refused to enable HA... after fixing the hosts file all was good. I proactively fixed the hosts file on the upgraded but not patched host, and after patching everything was still good there.

My cluster is running a mix of HS20 and HS21XM blades -- can't vmotion between them, sadly... but when the next HS21 comes in, I'll be able to decommission the HS20's entirely.

Reply
0 Kudos
Tom_Daytona
Contributor
Contributor
Jump to solution

Had this very same issue. After moving the hosts to a new vCenter it couldn't start HA. Thanks to Remnarc for pointing me in the right direction.

Originally I had our hosts on a 192.168.1.x ip subnet and then later moved them to 10.10.0.x subnet. The /etc/hosts file ended up still having the old IPs listed.

After changing them and reconfiguring for HA it worked.

Reply
0 Kudos
mudha
Hot Shot
Hot Shot
Jump to solution

check this KB where it says to reinsatll aam agent, this is a known issue with U4

<![endif]><![if gte mso 9]>

please mark it as right answer if you feel it Smiley Happy

Reply
0 Kudos
surje
Contributor
Contributor
Jump to solution

Modifying the HOSTS file worked for me too! Thanks.

Reply
0 Kudos
ivm17
Contributor
Contributor
Jump to solution

Hi,

I had exactly the same problem in a lab environment. After reading caddo's post I checked the host files and sure enough the host file had only the FQDN but not the NetBIOS name. Instead modifying the host file I decided to experiment a little. Because this was a lab environment I didn't have a running DNS on the subnet. So I decided to setup one on the VC. I created (A) records and corresponding PTR records for both hosts. Then I changed the DNS and routing settings on both hosts, removed them from the cluster and re-added them using the FQDNs. After that I was able to setup HA and DRS with no problem. Hope this helps.

I. Mitkov

Reply
0 Kudos
SergeUA
Contributor
Contributor
Jump to solution

Had the same problem with ESXi's.

Resolved by creating A-records in DNS-server and static records in WINS for both hosts.

Sergiy

Reply
0 Kudos
alvsti
Contributor
Contributor
Jump to solution

I also had this problem. I also hadn't applied all patches to all my servers. Her is my scenario:

3 ESX Servers

The cluster exist of server1 with 4.0.0 Build 244038 and server2 with 4.0.0 Build 208167.

The server I tried to add (server3) had Build 4.0.0 Build 244038, also same as the highest build in the cluster (this one failed with the Internal AAM Error)

This is what I did to resolve the issue:

1. Disconnected server2 (the one with the lowest build)

2. Run Reconfigure for VMware HA on server3 (the server was now added successfully).

3. Connected server2 (it also added successfully to the HA cluster, I surely upgraded this one to latest build afterwards)

I also checked my hosts-files and they are not updated with FQDNS names but I have added my servers to the windows DNS servers so esx-servers resolves names successfully anyway.

Reply
0 Kudos
itsyouth
Contributor
Contributor
Jump to solution

Hi All,

For my case, I did the following steps and worked out

1) Enter the host under “Maintenance Mode”

2) Uncheck “Turn On VMware HA” box at the cluster level

3) Take out the host from “Maintenance Mode” &

4) Finally enable HA at cluster level

The host was part of HA without any issues.

Reply
0 Kudos
bozitsu
Contributor
Contributor
Jump to solution

Hello,

i had sitation with 6 host in one cluster. I created another one and moved host to that ha cluster. Before putting it back to out of the maintenance mode i used Remediate and aplied 30 patches to it. After i removed it from Maintenance mode inside the second HA cluster i received the same error message after all the steps found in comunity forums and technical documents.

Than i created another HA cluster moved each host out of the second cluster, removed them from maintenance mode and then i dragged them to the third cluster. Only then i received no errors and completed movement of other host to this new cluster.

B.

Reply
0 Kudos