I had an ESXi 4 update 1 host which I used update manager to upgrade to 4.1. Upon rejoining the cluster, I receive a red mark and the host states
"HA agent on has an error : Error while running health check script"
Two hosts in a separate cluster do not have the same issue.
Thus far I've
Reconfigured HA
Disabled HA/Enabled HA for the cluster
Restarted management agents
Restarted Server
Put server in another cluster
...all result in the same error.
Any ideas?
Are you guys running ESX or ESXi? I am running ESXi in my cluster, maybe that's why disabling HA entirely worked for me.
Maybe you need to do more (like remove the folders etc) with ESX. When I turned HA off it removed all HA agents from the hosts, enabling it again reinstalled the agents.
Seems like the solution is all over the board. I still have two more hosts to upgrade but i am holding off some to see how 4.1 runs in my environment.
Sent from my iPhone
In my case i have ESXi 4.0u1 and i had this issue when i did the upgrade of vCenter Server to 4.1.
I tried to disable/enable the HA Agent, but this didn't work. I even tried to take the ESXi out of cluster and vCenter Server and add it again, but nothing worked! 😞
Regards / Saludos
-
Patricio Cerda !http://www.images.wisestamp.com/linkedin.png!
VMware VCP-410
Join to Virtualizacion en Español group in Likedin
-
Si encuentras que esta o cualquier otra respuesta ha sido de utilidad, vótalas. Gracias.
If you find this or any other answer useful please consider awarding points by marking the answer helpful or correct. Thank you.
Our hosts are ESX 4.0 U2 but we are in the process of upgrading hardware.
Since VMware will no longer release classic in the future we are rebuilding hosts with ESXi.
People will always have a mix until all their hosts are upgraded.
I did not follow all the instructions from KB1007234, only the part about removing HA manually and re-enabling HA on the cluster.
I am seeing all kinds of errors in two different clusters. I just cannot find a common ground.
I used the uninstall script on the ESXi host, re-enabled HA on cluster and my two
classic hosts are reporting the failed health check at first. Eventually the ESXi hosts HA configuration times out
and fails.
I think there is something seriously wrong with HA/DRS! And why does a host say it is unable to apply DRS resources
even when I disable DRS on a cluster??
Please consider marking my answer as "helpful" or "correct"
Did you remove host from cluster, run HA uninstall script, restart the services, join cluster again and enable HA? I assume this is a ESXi 4.1 host correct?
No I did not remove it from cluster. Just disabled HA and disconnected ESXI host.
Will try again. I have an ESXi 4.0 U2 host. I had exactly the same issue with 4.1 hence the reason I downgraded.
Cheers
Nah same issue, I think our problem lies somewhere else.
Cheers
Do you have DRS enabled? If so, try turning DRS off and HA off. When you enable HA don't enable DRS and see if that works.
Sent from my iPhone
yep same here fixed it for us
Refering to the link posted to resolve this issue, just want to ask:
1. when running services.sh stop, will this command affect the VM's functionality? or does this command stop only the management services on ESXi?
2. do I need to bring the ESXi host to maintenance mode to execute this fix?
thank you,
Benjamin Ho
All monitoring and admin access to the server will go down for a few minutes, but the VM's running on the server and their ability to access the LAN will not be interupted.
I went onsite at my customer site which has 3 ESXi host on 4.1 and was having the health check script on all 3 hosts.
I redo the HA configuration by clicking on the option "reconfigure for VMware HA" for first 2 ESXi hosts and after 2 or 3 tries the ESXi hosts eventually had their HA configured with no more error on health check script. VM on these 2 hosts can be vMotioned to each other.
but the 3rd ESXi hosts simply refused to pass the health check script even after I run the reconfiguration for more than 5 times.
to move forward, I think I have 2 options:
1. Disable HA at cluster level and uninstall HA for all 3 hosts. restart the management service, enable HA at cluster level so that HA will be reinstall for all 3 ESXi hosts
2. just uninstall HA component on ESXi host 3 and select the "reconfigure for VMware HA" option so that HA will be reinstall for host 3 only.
I am leaning more to option 2 because since HA on host 1 and 2 already has HA up and running, I really don't want to mess with it. but I am also not sure if option 2 will work.
a point to add, the files in /opt/vmware/aam/ha across all 3 hosts are all dated 22 April 2009, does this have any impact on HA functionality in 4.1?
Regards
Benjamin Ho