VMware Cloud Community
fusionit
Enthusiast
Enthusiast

Very Strange HA issue

Hello Everyone I have a 3 host vSphere cluster running build 175625. I have enabled HA and DRS on the cluster and it has been running fine for months. I placed one of my hosts in maintenance mode in order to reboot it becasue there was a problem with two VM's on that host in that I could not perform any action on them as I was getting the famous "cannot complete task as antoher task is already in progress" message and a simple restart of the management services on that host did not resolve it. so I rebooted this particular host and after it came back up I took it out of maintenance mode and it failed when trying to configure HA with yet another famous error message of "HA agent has an error : cmd addnode failed for secondary node: Internal AAM Error - agent could not start. : Unknown HA error" so I did the normal troubleshootig (DNS /etc/hosts files etc.) connectivity and name resolution are solid. OK so here's the strange issue. This HA error message was happening on ESX02 host

I disabled HA on my cluster. I then re-enabled it and ESX03 and ESX02 enabled HA just fine and ESX01 now failed with the exact same error message. so I disabled HA on ESX03 (put it in maintenance mode) and reconfigured HA on ESX01 and low and behold HA configured just fine. Took ESX03 out of maintenance mode and wouldn't you know it it failed on HA with the above mentioned error. For some reason it seems like HA does not want to enable on more than 2 hosts in my cluster. Does anyone have any ideas why this may be happening. I knew HA had its issues and I'm comfortable troubleshooting the software (VCP) I just can't figure this one out. Thanks in advance.

0 Kudos
2 Replies
admin
Immortal
Immortal

Can you take a look at or post the contents of /var/log/vmware/aam/aam_config_util_addnode.log?

Thanks,

Sridhar

0 Kudos
bulletprooffool
Champion
Champion

try using something like:

http://communities.vmware.com/blogs/virtuallysi/2009/04/02/esx-healthcheck-script-winner

to verify the health of the cluster.

Also check that you don;t have some strange licensing issue (unlikely . . but worth reviewing)

One day I will virtualise myself . . .
0 Kudos