I got the same error on a freshed installed server, have you reported it in , any resolution ?
No clue but have a suggestion, even this is not ideal in a 32 node cluster :-):
- Disable HA
- Rename the folder /var/log/vmware/aam on hosts (just to have a backup)
- Enable HA (aam folder will be recreated with needed files)
Just did this on a 2 node test cluster due to various HA errors..... Also running build 110268.
I got the same problem here, after upgrading VMware to 3.5.0 110268. One of my servers didn´t start HA.
I tried to turn OFF Cluster HA, then turn ON again, but the problem persists.
I still have no resolution and have not had a chance to open a ticket with support. I also tried to remove HA and reset, but still get the same issue..
I have had issue with HA previously when adding extra nodes during migration and it was actually a problem with the licences being expired. Check these licences and stop/start/re-read licence file on flexlm server.
I just got the exact same error, but this was after a fully function HA cluster. I rebooted two of my hosts in a four host cluster, after restart one of the hosts got
cmd addnode failed for primary node: Unable to import /var/log/vmware/aam/aam_config_util.def
while the other completed without errors. Just tried to delete aam folder but the error remains.
Any solutions to this?
Does anyone know the best solution to this issue yet.
I have a Cluster of 3 x ESX 3.5.0, 98103 servers, just upgraded one to 3.5.0, 110181 and now it wont enable HA.
cmd addnode failed for primary node: /opt/vmware/aam/bin/ft_startup failed.
Any suggestions appreciated.
I previously had that problem and had no idea what was wrong. I contacted VMware support who removed HA agent files from the host and then added them again via Reconfigure HA agent. That command installs it again, ie a fresh install. But, the same error remained.
After a few tries and struggles I started to check my connectivity and found out that my vmkping had no response to my other hosts.
My configuration are 2 Service Consoles and 1 VMotion. Backup Service Console and VMotion uses the same vmnic while the normal SC uses a separate. These two vmnics have no connection to each other.
After i made sure that both SC and VMotion had full contact to the other hosts the error disapeared. Problem was a loose VMotion cable on the host with the error.
Anyone had any luck with this? I'm experiencing it on a build and I've tried pretty much everything without success.
DNS is happy (short and long name), but I'm getting the same error about the "unable to import /var/log/vmware/aam/aam_config_util.def"
I've gone as far as populating (all lowercase) hosts files, rebuilding machines, changing network and host names without any success. The first host will always configure correctly (can be any machine, it's not fussy) but subsequent boxes all fail. I'm lost...
After all that, I have it working - but god only knows why it wasn't before. I had to:
1. Remove the HA cluster
2. Remove the second Service Console from all hosts
3. Reboot the VC
4. Reboot each host
5. Create a fresh cluster and add each host one at a time.
Works now.... odd! (build 110268)
i had the same problem and after trying several of your suggestions the one which worked was to remove a secondary Service Console in all host and re-enabling HA. I added these service consoles because last week i had a different problem and support told me to just add it. Today, when i upgrade from 2.5 to 2.5 update 3, HA didn't work.
Any update on this issue? I just added two new ESX nodes to an existing vm farm and got the following "cmd addnode failed for primary node /opt/vmware/aam/bin/ft_startup failed" on one of the two nodes. Both are at esx 3.5 build 120512. Virtual centre is at 2.5 update 3 (to sort the HA issues that existed in 2.5 update 2)
Both new esx servers are built exactly the same. Vmkping works fine between all four servers as well. Disabling the HA cluster is not an option at the moment due to customer concerns etc, and the fact that one server worked fine.
Oh and following this workaround from the Virtual centre 2.5 update 3 release notes isnt too helpful either fo rthe above reason around customer concerns.
Reconfiguration of HA Agent on an ESX Server Host Might Fail After VirtualCenter Server Is Upgraded
When a VirtualCenter Server that contains an HA-enabled cluster is upgraded, and ESX Server hosts are reconnected to the VirtualCenter, reconfiguring the HA agent of one of the ESX Server hosts might fail with an HA agent error message similar to the following:
HA agent on <Host_Name> in cluster <Cluster_Name> in <Datacetner_Name> has an error:
cmd addnode failed for primary node: /opt/vmware/aam/bin/ft_startup failed
Workaround: After upgrading the VirtualCenter Server, disable and re-enable HA on the cluster
I didn't see this trick mentioned and it just worked for me (3.5 u2 110268).
Host in maintenace mode, remove from VC and then re-add into the cluster. I had tried just about everything else on the list, and this is non-intrusive to your infrastructure (as long as your other servers can hold the running load while it's in maintenace mode).
This trick also worked for me.... thanks kattrap!!