TristanT
Contributor
Contributor

HA problems with ESXi 4, build 208167

I am standing up a new ESXi cluster with 5 nodes. All on identical, new hardware (HP BL490c booting from SD card). So far, so good. Hardware burn-in went well.

For some reason one node is having HA problems. Despite several reconfigurations of HA, this one host is always showing following error:

"HA agent on hostname in cluster clustername in datacenter has an error: cmd addnode failed for primary node: Internal AAM Error - agent could not start: Unknown HA error"

So far, here is what I've tried:

- I have enabled, disabled HA several times.

- I have done "Reconfigure for VMware HA" on the troublesome host.

- I have confirmed NTP is properly configured on all hosts

- I have verified forward/reverse DNS on all hosts and vCenter

I am hesitent to do the unsupported tech support edit of the ESXi hosts file. I'd like to stick to a supported configuration if I can.

Anyone have any thoughts on this before I open a support case.

Thanks all - you're the best! Points (and a beer at VMworld) for your helpful answers.

Tristan

Tags (4)
0 Kudos
5 Replies
marcelo_soares
Champion
Champion

Try removing the host from the vCenter and adding it again.

Marcelo Soares

VMWare Certified Professional 310/410

Virtualization Tech Master

Globant Argentina

Consider awarding points for "helpful" and/or "correct" answers.

Marcelo Soares
Troy_Clavell
Immortal
Immortal

also, you may try restarting the management agents

http://kb.vmware.com/kb/1003490

...and confirm name resolution is setup properly within the entire environment.

dominic7
Virtuoso
Virtuoso

Whenever I see this, nearly 90% of the time it's because the hostname isn't set on the ESXi host. If you click on configure HA task and look at failed task in tasks and events it will often give you a bit more information. Look for a message like "Failed to get hostname for localhost", or you can look at the /etc/hosts file from tech support mode and just verify that the host is able to resolve its own name successfully. Assuming that's the problem, the fix is just to change the DNS config to set the hostname correctly and reconfigure for HA again.

0 Kudos
TristanT
Contributor
Contributor

Removing / readding host didn't work.

Restarting management agents did't work.

Hostname properly configured on host (confirmed in DNS settings) and DNS configuration checks out. Also, I'm not seeing any "failed to get hostname" type of messages.

I'm going to look in tech support mode to see what's up.

Thanks all ~ you guys are awesome.

0 Kudos
Troy_Clavell
Immortal
Immortal

0 Kudos