VMware Cloud Community
DrewDennis
Contributor
Contributor

Cmd addnode failed for primary node?

After upgrading from 3.5.0, 84374 to 3.5.0,110268 I am getting the following HA error..

cmd addnode failed for primary node: Unable to import /var/log/vmware/aam/aam_config_util.def

Any clues before I call support?

Reply
0 Kudos
35 Replies
dcalcut
Contributor
Contributor

Removing HA and adding it back worked for me. Removing node and adding it back did not.

Reply
0 Kudos
PaddyL
Contributor
Contributor

Hi All,

Had 2 perfectly working ESXi 3.5 servers in a cluster. Rebooted the first host for some minor changes and HA worked fine when I reenabled it.

Tried the same thing on the second host and got the same error as everyone else on this post.

Tried everything mentioned above and none worked until I tried this:

Disable DRS and HA for Cluster

Enable DRS First and then Enable HA in that order and it is working fine again.

Very strange.

Patrick

Reply
0 Kudos
RS_1
Enthusiast
Enthusiast

Hi, it just appended to me too. i needed to reboot before re-add to VC.

i'm on ESXi 3.5.0 130755

Reply
0 Kudos
dmustakasjr
Contributor
Contributor

I tried your trick as well. Unfortunately it is still occurring. What's interesting is that after all that (putting the Hosts in Maintenance mode, removing them, recreating the cluster, reading them) I can individually take them out of maintenance mode and the service starts!!! I can even alternate and it works but with both of them on 1 or the other (depending on which one started first) the error occurs. Gotta love those illusive errors.

vRico Virtualization Enthusiast VCP 3&4, MCITP EA and a bunch of other stuff
Reply
0 Kudos
Matt_B1
Enthusiast
Enthusiast

I have a EVC cluster with ESX 3.5update3 servers in it w/o HA enabled. I enabled HA and got the "cmd addnode failed..." error on 2 of the 6 hosts. After reading this thread, I disabled HA on the cluster and enabled HA again. This works fine and was very easy to do. I would suggest trying this as one of your first troubleshooting steps.

Reply
0 Kudos
jopper
Contributor
Contributor

I had this same error and it turned out to be a licensing issue. I had to disable HA, re-start the license server and then enable HA.

Reply
0 Kudos
george2
Contributor
Contributor

I had the exact same error. It ended up being that I mistyped the hostname during the host build and had to rename the host. It took me a while to figure out because the dns name was correct and we caught it on a fluke.

To rename a esx server only find and rename the hostname in the files

/etc/hosts

/etc/sysconfig/network

Reply
0 Kudos
David_Wells
Contributor
Contributor

The steps I used were this (and it worked):

I had this error on 1 of the 3 systems in a particular cluster

On each HOST in the cluster, Reconfigure for HA (do the system(s) having the error last). When all systems had been reconfigured, they were fine.

Hope this helps.

Reply
0 Kudos
Martin_Adamsson
Contributor
Contributor

Check hosts file on every esx host for right configuration. See below, Remove line containing ":::1" and add rest of your ESX hosts.

# vi hosts

  1. Do not remove the following line, or various programs

  2. that require network functionality will fail.

127.0.0.1 localhost.localdomain localhost

192.168.1.202 MAX2.BTEC MAX2

192.168.1.200 MAX1.BTEC MAX1

Then chose to reconfigure for VMware HA from host menu on every host in failing cluster

/Martin

Reply
0 Kudos
VladKv
Contributor
Contributor

Rather than messing around with hosts file, I went to verify the DNS configuration, and found that it was set up incorrectly.

Fixing DNS settings and verifying DNS entries for all servers fixed it for me.

Reply
0 Kudos
cgdii
Contributor
Contributor

Many thanks to . I racked my brain on this all day. Removed and re-added the node, manually deleted the RPMs for VC, and was just about ready to rebuild it when I found this post. I checked the hosts file entry and discovered that the short name for this host had a typo in it. After correcting the typo reconfigure for HA worked.

Reply
0 Kudos
dingeem
Contributor
Contributor

I am running VC 2.5 Update 4 and had this happen to me. Removing node from cluster and readding finally fixed it for me.

Reply
0 Kudos
dgmeyer
Contributor
Contributor

I tried rebooting the faulty node, putting it into maintenance mode and taking it back out. None of that worked. Disabling HA for the cluster then re-enabling is what worked. Running ESXi4, no DNS, no command line. Individual hosts do have static IPs and FQDN configured properly. I may stick a DNS server on the vCenter host and see if adding that to each ESX host config would help avoid this problem in the future.

Reply
0 Kudos
timfnb
Contributor
Contributor

Same problem -- 2 working hosts, adding a third gave this error.

Tried maintenance mode, restart but it didn't help. Disabled DRS and HA, then Enabling DRS and HA in that order and it worked.

Reply
0 Kudos
Pugsr
Contributor
Contributor

I am having this same exact problem and not one of the solutions is working for me. The only one I didn't try was reloading ESXi on the server giving me a problem. I'm going to start that now

Reply
0 Kudos
pfarthing6
Contributor
Contributor

An old post, but i'm sure others may still experience the issue as I did.

What worked for me was to set the FT vmkernel on it's own subnet, separate from the others, then reconfigure HA.

And this article light the bulb over my head.

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1003789

Reply
0 Kudos