Hi everyone
I'm running 4 hosts each on esxi 4.1, with vsphere enterprise licensing and a seperate physical vcentre box.
I'm experimenting with DPM as we don't need all 4 hosts running all the time, however whenever one of the hosts comes out of standby it gets stuck on enabling HA, the exact error in the logs says:
It seems that more and more are experiencing these types of issues. Would you be so kind to create a support request so that our Engineers can look in to this?
Duncan (VCDX)
Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive
Ok I've submitted a support requested and disabled DPM for the moment as it breaks HA whenever the server comes back up.
Thanks for the advice
In the meantime, if you want to do any analysis on that, follow this:
1. Remove the ESX from VC, and add it back again. By doing this we are uninstalling and reinstalling AAM agent automatically.
2. Once you removed ESX from VC, the AAM agent should be uninstalled. If not, you can uninstall manually.
3. While you are adding back to VC, the AAM agent will install automatically.
Thanks,
Ganesh
I found this on another forum post, and it seems to have solved the issue:
Turn off HA for the cluster
then
From ESXi 4.1 SSH console: (You can enable SSH from Configuration > Security Profiles > Properties)
run the uninstall script
./opt/vmware/aam/VMware-aam-ha-uninstall.sh
services.sh stop
services.sh start
re-enable HA for the cluster and click "reconfigure for VMware HA" if it doesn't do it automatically.
refer: kb.vmware.com/kb/1007234
We have the same problem. I raised a support call and after a week of no contact I was then told by a disinterested tech that it was a bug in VC and to wait for a patch to be released. Call closed.
I've had the same response form vmware and they said they'll patch it soon so keep an eye out.
However the above method does seem to work, i'm not sure if i'm willing to trust it for production thought!
Can all of you please reply with the SR number so that I can give it to the engineers? Thanks,
Duncan
Hi all,
I'm experiencing the same issue. I though it could be originated by a primary/secondary misconfiguration when hosts became up again, so I''ve tried to apply the Duncan's primary node election in "Advanced Settings". Unfortunately didn't worked .
Regards,
Jose Manuel Carballo
Just to update; I re-raised this with support again after 4.1u1 didn't fix it. Have had a very good tech this time and after spending some time looking at network configurations and hosts files, it's been forwarded to engineering (PR?).
I've just had this through from Support as a possible resolution to this issue:
Applied the fix in that KB and we still have the same problem.