I'll start by saying.. HA used to work
What I have is two hosts. Each runs about half its load and the plan is if one were to fail the other takes on the load. Simple.
Problem is somewhere along my upgrade path, something got screwed and now HA will not re-configure. I constantly get the "HA agent on amer-esx1.fqdn.com in cluster Production in Waltham(datacenter) has an error.
I get this on both hosts. I can ping from each host to the other host, I made entries in the /etc/hosts just to make sure although pinging back and forth was working before. I then started to think it may have been a 3.0.1/3.5 issue because I tried it again after upgrading one of my hosts but it still happens. I enabled HA again after everything was upgraded and I got the same error as well as a insufficient resources error.
Are there any HA logs in 3.5.1 that may tell me exactly what is going on? Nothing else has really changed in terms of netowrking etc and I sure am running out of ideas ![]()
The agent may not have upgraded or installed properly. Remove the host from your cluster, and then add it back in. Restart hostd on the server, 'service mgmt-vmware restart'. Disconnect/reconnect the host in virtual center.
-KjB
is there any danger in doing so? Will removing my host from the cluster cause any effect on my virtual machines? Will running service mgmt-vmware restart cause my virtual machines to reboot? Should I treat this as a software upgrade and migrate all my machines off.. follow this procedure on one box.. move all machines back to that box.. and then repeat?
There was a bug in previous versions of esx that would cause your virtual machines to restart, if you had automatic shutdown/restart enabled in the esx configuration startup/shutdown section. If you did not go in and enable that, then you should be fine, but move vm's off to add to the comfort level. Removing the host from the cluster takes that cluster's resources out of the pool, and as long as there is a host still in the cluster, and the cluster is in tact, it should not have any other effect.
-KjB
Thread moved to the VI: ESX 3.5 forum
Tom Howarth
VMware Communities User Moderator
