VMware Cloud Community
scerazy
Enthusiast
Enthusiast
Jump to solution

vSphere HA Waiting for cluster election to complete Operation timed out

I have enough of re-installing ESX5i (so far this was trhe quickes way to get it working on number of separate incidents - as a side not, vSphere4 just worked fine in my environment, without THAT MUCH fluffing about)

So now I have a host which almost works (lastest upgrade to 515841), but will NOT get configured for HA

It starts fine, get to the point:

The vSphere HA availability state of this host has changed to Election

and then gives "nice" Timed out

This host can be dis/connected, pinged, has VM running on it, accesses all Datastores, yet will not do HA

I went through all KB aricle:

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&e...

with no resolution

Anybody has any more (useful) ideas?

Thanks

Seb

1 Solution

Accepted Solutions
admin
Immortal
Immortal
Jump to solution

Do you have SSL thumbprint verification turned on? (in the UI go to Administration->vCenter Server settings->SSL and make sure the checkbox is checked).

Elisha

View solution in original post

Reply
0 Kudos
36 Replies
scerazy
Enthusiast
Enthusiast
Jump to solution

And on one ugly, dark, raining afternoon, this server decided to become Master for HA, but that made my 4.1.0 build 381591 server Timed out

On cluster HA reconfigure, 4.1 become master, 5.0.0 build 504890 become slave & 5.0.0 build 515841 again Timed out

Obviously Vmware have something wrong here in mixed environment (and I am NOT upgrading each host to latest version, more like reverting back to version 4.1 if anything, had WAY less problems with that one - in fact none! for months)

Seb

Reply
0 Kudos
a_p_
Leadership
Leadership
Jump to solution

Reply
0 Kudos
admin
Immortal
Immortal
Jump to solution

Do you have SSL thumbprint verification turned on? (in the UI go to Administration->vCenter Server settings->SSL and make sure the checkbox is checked).

Elisha

Reply
0 Kudos
scerazy
Enthusiast
Enthusiast
Jump to solution

OK, day after the host in question DID appear in the box (it was not there day earlier), so I could select Verified

Now it is part of HA, thanks

Seb

Reply
0 Kudos
tjbailey
Enthusiast
Enthusiast
Jump to solution

Having the same issue here.  We've got a number of v5 hosts on build 474610 without an issue.  Had the problem going from v5 474610 -> 515841 and v4.1 433742 -> v5 515841.  "vCenter requires verified host SSL certificates" box already checked, tried unchecking and manually verify...still a no go.  I have a case open with VMware and will post an update when we narrow down the problem.

Reply
0 Kudos
scerazy
Enthusiast
Enthusiast
Jump to solution

For it to work, you need to actually SEE the host available in the box below (and also check it)

If it does not show there (so you can not check it) having SSL verify ON does not do anything on its own!

Seb

Reply
0 Kudos
tjbailey
Enthusiast
Enthusiast
Jump to solution

With "vCenter requires verified host SSL certificates" unchecked I add the host, it shows up in the box within the SSL Settings section, and check the "Verified" checkbox.  With the "vCenter requires verified host SSL certificates" checked I'm immediately prompted upon adding a host to vCenter on whether I want to trust the host or not.  If I click yes the process of adding the host continues.  If I click no I'm denied adding the host.  The one odd thing, though, is once the host is verified it disappears from the box in the SSL Settings section...I expected all hosts to be listed with their SHA1 thumbprint information and whether verified or not.

Reply
0 Kudos
scerazy
Enthusiast
Enthusiast
Jump to solution

I think that is the bug (not the only one) in VC...

Seb

Reply
0 Kudos
tjbailey
Enthusiast
Enthusiast
Jump to solution

I have a small update...patched our test environment hosts running on a completely separate instance of vCenter and they came back with no HA errors.  So for some reason our production vCenter doesn't play well with build 515841.  And the troubleshooting continues...

Reply
0 Kudos
scerazy
Enthusiast
Enthusiast
Jump to solution

But for me, it worked fine for a couple of servers, it choked on the third one, then it corrected itself.

So not as easy as to say it works here, but not there, more like it works sometimes & not other times

Seb

Reply
0 Kudos
tjbailey
Enthusiast
Enthusiast
Jump to solution

With the help of Elisha (eziskind) it looks like we got it nailed down.  The HA agent had to be manually uninstalled then automatically re-pushed to the host from vCenter.  Really strange that the issue happened only after the 12/15 patches, but at least I know what to do now.  What we did:

- place host into Maintenance Mode

- take a copy of /opt/vmware/uninstallers/VMware-fdm-uninstall.sh (we copied to /tmp)

- from the location you made a copy of the file, run the command (./VMware-fdm-uninstall.sh)

- you should see a short pause before it gets back to the prompt (you'll see why I mention this below)

- exit host out of Mainenance Mode and within the "Recent Tasks" area you should see the client being pulled from vCenter and installing

With one host this went without a hitch.  On the second host, though, when running the installer it immediately went back to the prompt, not a short delay like with the first.  Took it out of the cluster & vCenter, re-added, still didn't work.  Rebooted the host, made another copy of the uninstaller, and it removed successfully.

admin
Immortal
Immortal
Jump to solution

With your help, we're getting a handle on the problem associated with the upgrade.  After an upgrade, under conditions we're still investigating, an error is occurring when issuing a start request of the HA service on the upgraded host.  When that fails, HA then tries to re-install HA, and the re-install does nothing because the service is already there (and the right version) but we're left without an HA service running.

The work-around is as tjbailey already indicated in this post.

If you don't get results or have any issues, let us know so we can get you back up and running.  But your efforts have helped us to identify an issue that we hope to resolve soon.

Thanks.

Reply
0 Kudos
scerazy
Enthusiast
Enthusiast
Jump to solution

Thanks guys, nice to know the workaround (even I hope I would not need it again, but one never knows!)

I remember that uninstalling agent from the ESX server in v3 -4 was also a solution to many problems, but at least it was a documented procedure then, as opposed to v5

Seb

Reply
0 Kudos
tjbailey
Enthusiast
Enthusiast
Jump to solution

FYI...upon rebooting the upgraded hosts HA fails again.  Doing the temporary fix again works so we're going to hold off on the 12/15 patches until something new is released.

Reply
0 Kudos
admin
Immortal
Immortal
Jump to solution

> upon rebooting the upgraded hosts HA fails again.

Iin what way does it fail?  Do you mean that the HA agent didn't come up properly?  Could you gzip the /var/run/log directory on the host and send them to me?

Reply
0 Kudos
kfinken
Contributor
Contributor
Jump to solution

I am having this same issue.  I have setup a new cluster and am trying to enable HA for the first time.  I have tried the workaround, but my host still times out waiting for cluster election.

Reply
0 Kudos
admin
Immortal
Immortal
Jump to solution

Do you have SSL thumbprint verification turned on? (in the UI go to  Administration->vCenter Server settings->SSL and make sure the  checkbox is checked).

Reply
0 Kudos
kfinken
Contributor
Contributor
Jump to solution

Yep, the box is checked for "vCenter requires verified host SSL certificates" and the rest of that screen is greyed out.

Reply
0 Kudos
kfinken
Contributor
Contributor
Jump to solution

I currently only have one datastore and one NIC for the management network so I am getting the following errors:

"This host currently has no management network redundancy"

"The number of vSphere HA heartbeat datastores for this host is 1, which is less than required: 2"

It was my understanding that HA should function in spite of these errors.

Reply
0 Kudos