VMware Cloud Community
DrewDennis
Contributor
Contributor

Cmd addnode failed for primary node?

After upgrading from 3.5.0, 84374 to 3.5.0,110268 I am getting the following HA error..

cmd addnode failed for primary node: Unable to import /var/log/vmware/aam/aam_config_util.def

Any clues before I call support?

0 Kudos
35 Replies
limait
Contributor
Contributor

I got the same error on a freshed installed server, have you reported it in , any resolution ?

Kenneth

0 Kudos
Rubeck
Virtuoso
Virtuoso

No clue but have a suggestion, even this is not ideal in a 32 node cluster :-):

- Disable HA

- Rename the folder /var/log/vmware/aam on hosts (just to have a backup)

- Enable HA (aam folder will be recreated with needed files)

Just did this on a 2 node test cluster due to various HA errors..... Also running build 110268.

/Rubeck

0 Kudos
Aurizejr
Contributor
Contributor

Hello!!!

I got the same problem here, after upgrading VMware to 3.5.0 110268. One of my servers didn´t start HA.

Any res?

I tried to turn OFF Cluster HA, then turn ON again, but the problem persists.

Thx.

0 Kudos
DrewDennis
Contributor
Contributor

I still have no resolution and have not had a chance to open a ticket with support. I also tried to remove HA and reset, but still get the same issue..

0 Kudos
Cameron2007
Hot Shot
Hot Shot

I have had issue with HA previously when adding extra nodes during migration and it was actually a problem with the licences being expired. Check these licences and stop/start/re-read licence file on flexlm server.

0 Kudos
Iuridae
Contributor
Contributor

Hi there,

I just got the exact same error, but this was after a fully function HA cluster. I rebooted two of my hosts in a four host cluster, after restart one of the hosts got

cmd addnode failed for primary node: Unable to import /var/log/vmware/aam/aam_config_util.def

while the other completed without errors. Just tried to delete aam folder but the error remains.

Any solutions to this?

0 Kudos
davidjerwood
Enthusiast
Enthusiast

Does anyone know the best solution to this issue yet.

I have a Cluster of 3 x ESX 3.5.0, 98103 servers, just upgraded one to 3.5.0, 110181 and now it wont enable HA.

cmd addnode failed for primary node: /opt/vmware/aam/bin/ft_startup failed.

Any suggestions appreciated.

0 Kudos
Iuridae
Contributor
Contributor

Hi there,

I previously had that problem and had no idea what was wrong. I contacted VMware support who removed HA agent files from the host and then added them again via Reconfigure HA agent. That command installs it again, ie a fresh install. But, the same error remained.

After a few tries and struggles I started to check my connectivity and found out that my vmkping had no response to my other hosts.

My configuration are 2 Service Consoles and 1 VMotion. Backup Service Console and VMotion uses the same vmnic while the normal SC uses a separate. These two vmnics have no connection to each other.

After i made sure that both SC and VMotion had full contact to the other hosts the error disapeared. Problem was a loose VMotion cable on the host with the error.

0 Kudos
Dan_Willis
Enthusiast
Enthusiast

Hi,

Anyone had any luck with this? I'm experiencing it on a build and I've tried pretty much everything without success.

DNS is happy (short and long name), but I'm getting the same error about the "unable to import /var/log/vmware/aam/aam_config_util.def"

I've gone as far as populating (all lowercase) hosts files, rebuilding machines, changing network and host names without any success. The first host will always configure correctly (can be any machine, it's not fussy) but subsequent boxes all fail. I'm lost...

Dan

0 Kudos
Dan_Willis
Enthusiast
Enthusiast

After all that, I have it working - but god only knows why it wasn't before. I had to:

1. Remove the HA cluster

2. Remove the second Service Console from all hosts

3. Reboot the VC

4. Reboot each host

5. Create a fresh cluster and add each host one at a time.

Works now.... odd! (build 110268)

0 Kudos
pepehdz
Contributor
Contributor

i had the same problem and after trying several of your suggestions the one which worked was to remove a secondary Service Console in all host and re-enabling HA. I added these service consoles because last week i had a different problem and support told me to just add it. Today, when i upgrade from 2.5 to 2.5 update 3, HA didn't work.

0 Kudos
dmanconi
Enthusiast
Enthusiast

Hi

Any update on this issue? I just added two new ESX nodes to an existing vm farm and got the following "cmd addnode failed for primary node /opt/vmware/aam/bin/ft_startup failed" on one of the two nodes. Both are at esx 3.5 build 120512. Virtual centre is at 2.5 update 3 (to sort the HA issues that existed in 2.5 update 2)

Both new esx servers are built exactly the same. Vmkping works fine between all four servers as well. Disabling the HA cluster is not an option at the moment due to customer concerns etc, and the fact that one server worked fine.

Oh and following this workaround from the Virtual centre 2.5 update 3 release notes isnt too helpful either fo rthe above reason around customer concerns.

Reconfiguration of HA Agent on an ESX Server Host Might Fail After VirtualCenter Server Is Upgraded

When a VirtualCenter Server that contains an HA-enabled cluster is upgraded, and ESX Server hosts are reconnected to the VirtualCenter, reconfiguring the HA agent of one of the ESX Server hosts might fail with an HA agent error message similar to the following:

HA agent on <Host_Name> in cluster <Cluster_Name> in <Datacetner_Name> has an error:

cmd addnode failed for primary node: /opt/vmware/aam/bin/ft_startup failed

Workaround: After upgrading the VirtualCenter Server, disable and re-enable HA on the cluster

Cheers

David

0 Kudos
kattrap
Contributor
Contributor

I didn't see this trick mentioned and it just worked for me (3.5 u2 110268).

Host in maintenace mode, remove from VC and then re-add into the cluster. I had tried just about everything else on the list, and this is non-intrusive to your infrastructure (as long as your other servers can hold the running load while it's in maintenace mode). Smiley Wink

0 Kudos
shawnrb
Contributor
Contributor

This trick also worked for me.... thanks kattrap!!

0 Kudos
cityexplorer
Contributor
Contributor

I have the same problem but it is very strange. I tried whatever methods but it works only if I

1) reread the licence file

2) remove HA

3) re-enable HA

--> OK.. but,

4) re-configure HA --> fail again!

What happen??

0 Kudos
dpearsall
Contributor
Contributor

i had same issue after upgrading VC to U3 from U2. On my 2 node cluster, first i tried disabling HA and reenabling it, but that failed. then i removed both HA and DRS, then enabled only HA and it worked. I then switched on DRS also and it remained ok.

0 Kudos
BrianKayser
Contributor
Contributor

I was getting this error after rebooting the host. Just entering Maintenance Mode and then exiting Maintenance Mode fixed it for me. It may have been because it would not enter Maintenance Mode prior to the reboot as it was acting flakey (thus the reason for the reboot).

0 Kudos
vmetc_rich
Enthusiast
Enthusiast

kattrap,

Followed your advice of maintenance mode, remove from VC, re add host directly to cluster. Worked for me! Thanks!

0 Kudos
Adidas6
Contributor
Contributor

I just wanted to add my 2 cents that we were seeing the same symptom as part of some other HA problems on 2 hosts in a 5 host cluster. For each host we did: Maintenance Mode => Remove from VC => Reboot Host => Re-add directly into clutser => Exit Maintenance Mode.

This was the final step in resolving our HA issues and it resolved the issue as described by the original poster as well.

Thanks!

0 Kudos