VMware Cloud Community
insearchof
Expert
Expert
Jump to solution

vsphere Ha failover operation in progress

Vmware 6.7

ESXi 6.7

DRS

HA

Cluster

How can I get rid of this error?

vSphere HA failover operation in progress in cluster TGCSNET-Vcenter1-Cluster in datacenter Datacenter-TGCSNET: 0 VMs being restarted, 5 VMs waiting for a retry, 0 VMs waiting for resources, 0 inaccessible vSAN VMs

Thank  you

Tom

Reply
0 Kudos
1 Solution

Accepted Solutions
21 Replies
insearchof
Expert
Expert
Jump to solution

I found the answer

This is now resolved

https://kb.vmware.com/s/article/2004802

gparker
Enthusiast
Enthusiast
Jump to solution

Thanks, this worked for me too in my customer's vSphere 6.5 environment. Smiley Happy

MaryPhillip
Contributor
Contributor
Jump to solution

Link is expired, anyone can help? I have the same issues now.

 

regards,

mary

Reply
0 Kudos
CompassITCanada
Contributor
Contributor
Jump to solution

I also have the same issue with new 7.0U2 cluster - can anyone offer a solution please?

Reply
0 Kudos
AlexBernardes
Contributor
Contributor
Jump to solution

To disable/enable the vSphere HA in the vSphere Web Client:

  1. Log in to the vSphere Web Client. The default URL is:

    https://vCenter_Server_FQDN:9443/vsphere-client 
     
  2. In the Home screen, click Hosts and Clusters.
  3. Locate the cluster.
  4. Right-click the cluster and click Settings.
  5. Click vSphere HA located under Services.
  6. Click Edit.
  7. Deselect the Turn On vSphere HA option.
  8. To enable HA repeat the above steps and select Turn on VMware HA option.
  9. Click OK.
NickRW
Contributor
Contributor
Jump to solution

You can also follow these steps when logged into to regular vSphere vCenter. Many thanks Alex.

Reply
0 Kudos
James_Holden
Contributor
Contributor
Jump to solution

Also having this problem.

Are you referring to the setting indicated in this screen shot of 7.0.3 ? It appears the language has changed to "VSphere DRS".

Edit: I was incorrect. see below.

 

Reply
0 Kudos
Jrlouk
Contributor
Contributor
Jump to solution

From your screenshot @James_Holden it appears you might be in the wrong location.  Just to the left, if you look down just under the DRS settings, there is an vSphere Availability.  Thats is where your HA settings would be.

Reply
0 Kudos
James_Holden
Contributor
Contributor
Jump to solution

Thank you for your reply! Yes, I was looking in the wrong place. BUT NOW...

See the attached screen shot: When I go to Cluster > Edit > Configure > Services\vSphere availability, I see vSphere HA is turned ON,

but when I click the "Edit..." button on the far right, the pop-up dialogue box is Blank . 

See 2nd attached screen shot.

 

Reply
0 Kudos
MaryPhillip
Contributor
Contributor
Jump to solution

Yes, this works for me. thanks. only turn the HA OFF then ON it back. Thanks you!

Reply
0 Kudos
CSpreha
Enthusiast
Enthusiast
Jump to solution

Has anyone found an actual solution to this?  Cycling HA is not a solution, it's another cheap VMware band-aid.  We experience this hung/failed HA experience all the time and waiting for HA to remove then re-enable across large clusters is a waste of admin time.

Mike_P
Enthusiast
Enthusiast
Jump to solution

In 6.7 I found that just turning off Host Monitoring then back on in the Edit Cluster Settings dialog (de-select "Enable Host Monitoring", click OK, then go back in to Edit Cluster Settings and re-select "Enable Host Monitoring" and click OK) cleared this message and was quick to execute - didn't need a full HA reconfiguration across cluster.

vasquezu
Contributor
Contributor
Jump to solution

I agree. This seems to be occurring for us at least once every other week. A true fix would be nice but i can't find the cause.

Reply
0 Kudos
tonyanshe
Enthusiast
Enthusiast
Jump to solution

@vasquezu are your HA events only related to "VMs waiting for a retry".

We are constantly seeing these events for example. "0 VMs being restarted, 10 VMs waiting for a retry, 0 VMs waiting for resources, 0 inaccessible vSAN VMs"

Only resolution is to disable HA and enable. Currently on VC 7.0U3j, VMware want me to be on the latest before investigating further. 

There is really no HA event and no VM is ever restarted. 

 

Reply
0 Kudos
vasquezu
Contributor
Contributor
Jump to solution

I am getting the same events as you. I have just been diabling HA and reenabling for a good 20 or so months as far as i can remember lol. It would just be nice to find the root cause and fix it.

I have engaged VMware support and that dragged out for a few months and we finally just had them close the SR in Aug.

Reply
0 Kudos
tonyanshe
Enthusiast
Enthusiast
Jump to solution

Thanks, I just gave up clearing the alarm and let sit there and the "VMs waiting for a retry" number just increases and increases.

Not always but sometimes we tend to see an alarm on a VM "vSphere HA virtual machine failover failed". Normally due to event "Insufficient resources to fail over this virtual machine. vSphere HA will retry the failover..."

Not once are these actually related to a HA event or does a VM reboot.

I have cleared my cluster alarms this morning. Going to try a SR with VMware again. Will let you know if I get anywhere.

 

Reply
0 Kudos
Gabrie1
Commander
Commander
Jump to solution

Same here for our VDI cluster. When VMs get redeployed we see these alarms popping up:

2023-05-11T16:27:44.697231+02:00 <ourvcenter> vpxd 6543 - -  Event [87492702] [1-1] [2023-05-11T16:27:44.693618+02:00] [vim.event.EventEx] [info] [] [DC-xxxxx] [87492702] [vSphere HA failover operation in progress in cluster xxxx in datacenter xxxxx: 0 VMs being restarted, 1 VMs waiting for a retry, 0 VMs waiting for resources, 0 inaccessible vSAN VMs]
http://www.GabesVirtualWorld.com
Reply
0 Kudos
tonyanshe
Enthusiast
Enthusiast
Jump to solution

After a lot of pressing I finally got an update on this issue.

"This is happening due to a rare condition in the VCenter HA service. This is a known issue by engineering, and currently, there is no resolution. The engineering case is 2802103 for your reference. You can clear the alarm by disabling and enabling VCenter HA as a workaround."

The engineering case is open since 17/06/2021 and was first noticed in 6.7 P04 and is still happening in 8.0.

A VM deletion seems to trigger the event. 

Unfortunately we are still stuck with disabling and enabling vSphere HA to resolve.

 

Reply
0 Kudos
Rasulzade
Contributor
Contributor
Jump to solution

Hello. Currently, all of my virtual machines are powered on and running in the cluster. HA (High Availability) is enabled. What will happen if I disable and then enable HA?

Reply
0 Kudos