VMware Cloud Community
caddo
Enthusiast
Enthusiast
Jump to solution

cmd addnode failed for secondary node: Internal AAM Error - agent could not start.: Unknown HA error

I'm testing upgrade paths to vSphere from ESX 3.5 update 4 on a IBM BladeCenter with 2 hs21 xm blade servers; i'm going through several problems and the last one is the one mentioned in the subject of this thread.

In this scenario i upgraded vCenter succesfully then i took all VMs on a single esx 3.5 host, i removed the other host from the cluster and then from vCenter, then i made a fresh install of vSphere, i reconnected the host to the cluster then i did the whole procedure again with the second node. At the end i have two hosts with vSphere installed but i had to disable HA in my cluster since i always get this error when i try to configure HA agents on the hosts, but i have to say that DRS works ok.

In the release notes of vSphere, in the known issues section i can read:

"Upgrading from an ESX/ESXi 3.x host to an ESX/ESXi 4.0 host results in a successful upgrade, but VMware HA reconfiguration might fail

When you use vCenter Update Manager 4.0 to upgrade an ESX/ESXi 3.x host to ESX/ESXi 4.0, if the host is part of an HA or DRS cluster, the upgrade succeeds and the host is reconnected to vCenter Server, but HA reconfiguration might fail. The following error message displays on the host Summary tab: HA agent has an error : cmd addnode failed for primary node: Internal AAM Error - agent could not start. : Unknown HA error .

Workaround: Manually reconfigure HA by right-clicking the host and selecting Reconfigure for VMware HA."

The problem is that this workaround doesn't work for me, so i was wondering if someone, once again is able to help me with this issue.

Thanks in advance for your support.

29 Replies
bundalov
Contributor
Contributor
Jump to solution

Removing hosts from clusters didn't solve my problem.

But uninstalling cluster, creating a new one and adding hosts to the new cluster did the thing Smiley Happy

It is now working like a charm Smiley Happy

0 Kudos
bozitsu
Contributor
Contributor
Jump to solution

INFORMACIJA

Hvala vam na vašoj poruci. Biću odsutan sa posla do 19. Jula i u tom periodu ću imati ograničen pristup elektronskoj pošti (nije potrebno da šaljete poruku ponovo). Ukoliko imate bilo kakvih hitnih zahtjeva, molim vas da pošaljete poruku direktoru Sasi Anticu 3108 575 (sasa.antic@saga.rs)

Hvala.

Ovo je automatski odgovor - nije potrebno da šaljete poruku ponovo.

0 Kudos
cgccvmware
Contributor
Contributor
Jump to solution

We ran into these same HA problems upgrading from 4.0 to 4.1.

DRS needs to complete a full cycle and balance out the hosts before HA will properly load.

Dunno why but HA needed DRS to build it's tables first. Otherwise you could try the various methods mentioned in here and

and it may work but in our case we needed DRS to complete first.

After the hosts were more balanced, HA loaded fine.

0 Kudos
alvsti
Contributor
Contributor
Jump to solution

.

0 Kudos
jwozvmguy
Contributor
Contributor
Jump to solution

Had this same error on ESXi 4.1 - what worked for me is that I noticed is that the domain name was not filled in under DNS and Routing settings.  Once I updated that, I was able to reconfigure for HA.  I suppose this is the same issue as the hosts files on older ESX hosts.

0 Kudos
Gazrighian
Contributor
Contributor
Jump to solution

Thank's. Disabling and re eanbling HA on the cluster resolve the problem for me.

0 Kudos
SuperP99
Contributor
Contributor
Jump to solution

Can agree with the last entry, after attempting to re-enable HA on the sole host this repeatedly failed, Disable HA on the Cluster then re-enable and you're back to full HA again.

Smiley Happy

0 Kudos
rickyj001
Contributor
Contributor
Jump to solution

We had the same error in a cluster with 6 blade servers running ESXi 4.1 on a C7000 blade chassis.  This only occured on 2 of the six servers after I turned on HA (with DRS).  After reading all these posts, I looked at the blades in the HP Onboard Administrator (like hypervisor).

It showed that three of the servers were configured as "hostname.localdomain" instead of "hostname.domainname"

For example if I had esxlab1.localdomain, I had to change that to esxlab1.mynet.com.

After I changed all 3 servers to the correct domain, I was able to get HA to run by right clicking on the hosts and selecting "Reconfigure for VMWare HA."

Note: I have no idea why one of the 3 servers (that had the wrong domain) allowed HA to be set up on it withour error.  I just know that fixing the domain name worked on the other 2 that were getting the error.

0 Kudos
bryan4S
Contributor
Contributor
Jump to solution

I was able to fix mine by removing the host from the cluster while in maintenance mode, re-adding to the cluster, exit MM and then disable/enable HA for the entire cluseter.

0 Kudos
Earl50
Contributor
Contributor
Jump to solution

@ Mirko Huth

  

Solution :

=> disabling HA on the cluster and enabling it again.

##########

You made my day !

Works for me too.

0 Kudos