I'm having trouble with HA for one ESX host.
I have a cluster of 5 hosts with HA enabled, and one in particular is causing issues. The Summary view shows an error with HA, and when I review Events I see the following error:
internalerror: Vmap_<servername> process failed to stop
I had an original error with the HA, relating to the EMC AAM agent - sadly that's now gone form the events list so I can't report it here. At that time I re-ran the "Reconfigure for HA" task, which resulted in the above error. The host has been rebooted, so it appears it's a configuration rather than a state issue.
Can anyone advise the best method of troubleshooting this error?
The main cause i find as already mentioned is DNS, as a rule i edit the /etc/hosts file on each ESX host (save a master and edit it in notepad then use winscp to upload it to each host) making sure that it's IP address - long name - short name for each ESX host and also for the VC server. then edit /Windows/System32/drivers/etc/hosts file on the VC server as well.
If it fails after this you can try re-creating your cluster and re-adding the hosts.
Hi whynotq, Rajeev,
Thanks for the responses; Rajeev, DNS was the first thing I suspected, but I have confirmed that all the hosts have FQDNs, and their A records in DNS are correct. whynotq, I'm going to try using hosts as a test to see if that does resolve, but if it does I think I'll be onto VMWare support; static entries in host files are no way for an infrastructure to work, and in fact my info is they don't support configs which don't fully use DNS (though that info is second-hand).
I've also ensured that all the ESX hosts can see the gateway, so it shouldn't be isolation mode, unless something else is triggering it.
Again, many thanks for your help.
Thanks again for the replies.
In answer, yes, I've confirmed DNS is ok on each host & the VC server. I've also tried removing & re-enabling HA, and finally I rebuilt the ESX host with the problem, none of which have made any difference.
However, it now looks like it might be licensing; we were advised we needed a certain number of licenses, and that the dual cores in each CPU didn't need to be counted individually. That doesn't seem to be the case, though. I'm waiting on more word from our manager, who is managing that problem.
Once again, thanks for the help; we'll see how things are once this licensing issue has been resolved
I have just gone through the exact same problem, after a power down and up of the cluster HA wouldn't enable, it turned out to be the LGTOAAM51_vmware script was missing from /etc/init.d on one of the hosts so the Automated Availability Manager processes didn't start on boot up, once I had copied the script back and started the AAM processes on all of the hosts I was able to complete the reconfigure for HA task successfully.