I have enough of re-installing ESX5i (so far this was trhe quickes way to get it working on number of separate incidents - as a side not, vSphere4 just worked fine in my environment, without THAT MUCH fluffing about)
So now I have a host which almost works (lastest upgrade to 515841), but will NOT get configured for HA
It starts fine, get to the point:
The vSphere HA availability state of this host has changed to Election
and then gives "nice" Timed out
This host can be dis/connected, pinged, has VM running on it, accesses all Datastores, yet will not do HA
I went through all KB aricle:
with no resolution
Anybody has any more (useful) ideas?
And on one ugly, dark, raining afternoon, this server decided to become Master for HA, but that made my 4.1.0 build 381591 server Timed out
On cluster HA reconfigure, 4.1 become master, 5.0.0 build 504890 become slave & 5.0.0 build 515841 again Timed out
Obviously Vmware have something wrong here in mixed environment (and I am NOT upgrading each host to latest version, more like reverting back to version 4.1 if anything, had WAY less problems with that one - in fact none! for months)
Having the same issue here. We've got a number of v5 hosts on build 474610 without an issue. Had the problem going from v5 474610 -> 515841 and v4.1 433742 -> v5 515841. "vCenter requires verified host SSL certificates" box already checked, tried unchecking and manually verify...still a no go. I have a case open with VMware and will post an update when we narrow down the problem.
For it to work, you need to actually SEE the host available in the box below (and also check it)
If it does not show there (so you can not check it) having SSL verify ON does not do anything on its own!
With "vCenter requires verified host SSL certificates" unchecked I add the host, it shows up in the box within the SSL Settings section, and check the "Verified" checkbox. With the "vCenter requires verified host SSL certificates" checked I'm immediately prompted upon adding a host to vCenter on whether I want to trust the host or not. If I click yes the process of adding the host continues. If I click no I'm denied adding the host. The one odd thing, though, is once the host is verified it disappears from the box in the SSL Settings section...I expected all hosts to be listed with their SHA1 thumbprint information and whether verified or not.
I have a small update...patched our test environment hosts running on a completely separate instance of vCenter and they came back with no HA errors. So for some reason our production vCenter doesn't play well with build 515841. And the troubleshooting continues...
But for me, it worked fine for a couple of servers, it choked on the third one, then it corrected itself.
So not as easy as to say it works here, but not there, more like it works sometimes & not other times
With the help of Elisha (eziskind) it looks like we got it nailed down. The HA agent had to be manually uninstalled then automatically re-pushed to the host from vCenter. Really strange that the issue happened only after the 12/15 patches, but at least I know what to do now. What we did:
- place host into Maintenance Mode
- take a copy of /opt/vmware/uninstallers/VMware-fdm-uninstall.sh (we copied to /tmp)
- from the location you made a copy of the file, run the command (./VMware-fdm-uninstall.sh)
- you should see a short pause before it gets back to the prompt (you'll see why I mention this below)
- exit host out of Mainenance Mode and within the "Recent Tasks" area you should see the client being pulled from vCenter and installing
With one host this went without a hitch. On the second host, though, when running the installer it immediately went back to the prompt, not a short delay like with the first. Took it out of the cluster & vCenter, re-added, still didn't work. Rebooted the host, made another copy of the uninstaller, and it removed successfully.
With your help, we're getting a handle on the problem associated with the upgrade. After an upgrade, under conditions we're still investigating, an error is occurring when issuing a start request of the HA service on the upgraded host. When that fails, HA then tries to re-install HA, and the re-install does nothing because the service is already there (and the right version) but we're left without an HA service running.
The work-around is as tjbailey already indicated in this post.
If you don't get results or have any issues, let us know so we can get you back up and running. But your efforts have helped us to identify an issue that we hope to resolve soon.
Thanks guys, nice to know the workaround (even I hope I would not need it again, but one never knows!)
I remember that uninstalling agent from the ESX server in v3 -4 was also a solution to many problems, but at least it was a documented procedure then, as opposed to v5
FYI...upon rebooting the upgraded hosts HA fails again. Doing the temporary fix again works so we're going to hold off on the 12/15 patches until something new is released.
> upon rebooting the upgraded hosts HA fails again.
Iin what way does it fail? Do you mean that the HA agent didn't come up properly? Could you gzip the /var/run/log directory on the host and send them to me?
I am having this same issue. I have setup a new cluster and am trying to enable HA for the first time. I have tried the workaround, but my host still times out waiting for cluster election.
I currently only have one datastore and one NIC for the management network so I am getting the following errors:
"This host currently has no management network redundancy"
"The number of vSphere HA heartbeat datastores for this host is 1, which is less than required: 2"
It was my understanding that HA should function in spite of these errors.