VMware Cloud Community
Gallwapa
Contributor
Contributor
Jump to solution

After 4.1 upgrade, 1 host "Error while running health check script" - What to check next?

I had an ESXi 4 update 1 host which I used update manager to upgrade to 4.1. Upon rejoining the cluster, I receive a red mark and the host states

"HA agent on has an error : Error while running health check script"

Two hosts in a separate cluster do not have the same issue.

Thus far I've

Reconfigured HA

Disabled HA/Enabled HA for the cluster

Restarted management agents

Restarted Server

Put server in another cluster

...all result in the same error.

Any ideas?

0 Kudos
33 Replies
s1xth
VMware Employee
VMware Employee
Jump to solution

Are you guys running ESX or ESXi? I am running ESXi in my cluster, maybe that's why disabling HA entirely worked for me.

Maybe you need to do more (like remove the folders etc) with ESX. When I turned HA off it removed all HA agents from the hosts, enabling it again reinstalled the agents.

Seems like the solution is all over the board. I still have two more hosts to upgrade but i am holding off some to see how 4.1 runs in my environment.

Sent from my iPhone

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
pcerda
Virtuoso
Virtuoso
Jump to solution

In my case i have ESXi 4.0u1 and i had this issue when i did the upgrade of vCenter Server to 4.1.

I tried to disable/enable the HA Agent, but this didn't work. I even tried to take the ESXi out of cluster and vCenter Server and add it again, but nothing worked! 😞




Regards / Saludos

-


Patricio Cerda !http://www.images.wisestamp.com/linkedin.png!

VMware VCP-410

Join to Virtualizacion en Español group in Likedin

See My Blog

See My Linkedin Profile

-


Si encuentras que esta o cualquier otra respuesta ha sido de utilidad, vótalas. Gracias.

If you find this or any other answer useful please consider awarding points by marking the answer helpful or correct. Thank you.

Regards / Saludos - Patricio Cerda - vExpert 2011 / 2012 / 2013
0 Kudos
AllBlack
Expert
Expert
Jump to solution

Our hosts are ESX 4.0 U2 but we are in the process of upgrading hardware.

Since VMware will no longer release classic in the future we are rebuilding hosts with ESXi.

People will always have a mix until all their hosts are upgraded.

Please consider marking my answer as "helpful" or "correct"
0 Kudos
computerguy7
Enthusiast
Enthusiast
Jump to solution

I did not follow all the instructions from KB1007234, only the part about removing HA manually and re-enabling HA on the cluster.

0 Kudos
AllBlack
Expert
Expert
Jump to solution

I am seeing all kinds of errors in two different clusters. I just cannot find a common ground.

I used the uninstall script on the ESXi host, re-enabled HA on cluster and my two

classic hosts are reporting the failed health check at first. Eventually the ESXi hosts HA configuration times out

and fails.

I think there is something seriously wrong with HA/DRS! And why does a host say it is unable to apply DRS resources

even when I disable DRS on a cluster??

Please consider marking my answer as "helpful" or "correct"

Please consider marking my answer as "helpful" or "correct"
0 Kudos
computerguy7
Enthusiast
Enthusiast
Jump to solution

Did you remove host from cluster, run HA uninstall script, restart the services, join cluster again and enable HA? I assume this is a ESXi 4.1 host correct?

0 Kudos
AllBlack
Expert
Expert
Jump to solution

No I did not remove it from cluster. Just disabled HA and disconnected ESXI host.

Will try again. I have an ESXi 4.0 U2 host. I had exactly the same issue with 4.1 hence the reason I downgraded.

Cheers

Please consider marking my answer as "helpful" or "correct"
0 Kudos
AllBlack
Expert
Expert
Jump to solution

Nah same issue, I think our problem lies somewhere else.

Cheers

Please consider marking my answer as "helpful" or "correct"
0 Kudos
s1xth
VMware Employee
VMware Employee
Jump to solution

Do you have DRS enabled? If so, try turning DRS off and HA off. When you enable HA don't enable DRS and see if that works.

Sent from my iPhone

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
fdcpinto
Contributor
Contributor
Jump to solution

Tried the solution gave buy Computerguy7 and it works.

Filipe D. C. Pinto

Filipe D. C. Pinto
0 Kudos
sssstew
Enthusiast
Enthusiast
Jump to solution

yep same here fixed it for us Smiley Happy

Stew
0 Kudos
hchuin
Contributor
Contributor
Jump to solution

Refering to the link posted to resolve this issue, just want to ask:

1. when running services.sh stop, will this command affect the VM's functionality? or does this command stop only the management services on ESXi?

2. do I need to bring the ESXi host to maintenance mode to execute this fix?

thank you,

Benjamin Ho

0 Kudos
computerguy7
Enthusiast
Enthusiast
Jump to solution

All monitoring and admin access to the server will go down for a few minutes, but the VM's running on the server and their ability to access the LAN will not be interupted.

0 Kudos
hchuin
Contributor
Contributor
Jump to solution

I went onsite at my customer site which has 3 ESXi host on 4.1 and was having the health check script on all 3 hosts.

I redo the HA configuration by clicking on the option "reconfigure for VMware HA" for first 2 ESXi hosts and after 2 or 3 tries the ESXi hosts eventually had their HA configured with no more error on health check script. VM on these 2 hosts can be vMotioned to each other.

but the 3rd ESXi hosts simply refused to pass the health check script even after I run the reconfiguration for more than 5 times.

to move forward, I think I have 2 options:

1. Disable HA at cluster level and uninstall HA for all 3 hosts. restart the management service, enable HA at cluster level so that HA will be reinstall for all 3 ESXi hosts

2. just uninstall HA component on ESXi host 3 and select the "reconfigure for VMware HA" option so that HA will be reinstall for host 3 only.

I am leaning more to option 2 because since HA on host 1 and 2 already has HA up and running, I really don't want to mess with it. but I am also not sure if option 2 will work.

a point to add, the files in /opt/vmware/aam/ha across all 3 hosts are all dated 22 April 2009, does this have any impact on HA functionality in 4.1?

Regards

Benjamin Ho

0 Kudos