Due to various changes in our environment, I had to reconfigure our HA/DRS cluster. Two of my hosts are reporting the following error when being added to the cluster during the reconfiguration for HA:
An error occurred during configuration of the HA Agent on the host.
"failed to delete Vmap_the-host-name process"
The hosts are running ESX 3.5.0 (64607) with Virtual Center 2.5.
Anyone have an idea what's up or at least how I can resolve this?
Hi, I hade the same issue on one of my HA nodes, what I found was that the HA agent on the 4 nodes( /var/log/vmware/aam/aam_config_util_listnodes_log ) was not anymore in sync, as if the nodes do not know exactly who have the HA agent running and who's not. So I've disabled the HA from the cluster..waited to have a complete action thant reconfigure on all nodes the HA, this way I've foreced all the nodes to resync the information.
That's pretty much what I did. Removed the offending host from the cluster then added it back in. It would be nice to know how it happened in the first place and how to resolve it gracefully without removing nodes from the cluster.
Well basically disabling and renabling the HA was the "quick and dirty" solution. I've choosed that solution since, in my case, the problem was that the 4 nodes had different information (1st: said that 2 agent were running and other 2 not, 2nd :1 agent on,3 off; 3rd:2agent on , 2 agent off but not the same of 1st; 4rd: only its agent on) so it was nosense trying to disable the HA agent by command line and even a restart of the agent after a logrotate in order to flush the information didnt'well.
Basically in my case the problem happened beacuse the customer built a VLAN dedicated to the management network of the ESX and seems that they lost the sync for a while.
I have the same problem with 3.5 (actually 3i installable). HA failed on 1 of my 5 hosts. I didn't see any errors in the log but when I tried to reconfigure HA, I get the "cannot delete Vmap process" error. I have to disable the entire cluster or take the machine out of the cluster to get HA back on. The big problem is that this has happened twice in the last two months and there is no notification that HA is not working.
Error is the same but situation is different, i have 2 hosts running 3.0.1 that i need to update to 3.5.0. I added another host running 3.5.0, same model blade, so i can migrate my guests off during the updgrade. No matter what i do, i get "An error occurred during configuration of the HA Agent on the host. I've checked everything i can think of. Thoughts?