How can I get rid of this error?
vSphere HA failover operation in progress in cluster TGCSNET-Vcenter1-Cluster in datacenter Datacenter-TGCSNET: 0 VMs being restarted, 5 VMs waiting for a retry, 0 VMs waiting for resources, 0 inaccessible vSAN VMs
To disable/enable the vSphere HA in the vSphere Web Client:
From your screenshot @James_Holden it appears you might be in the wrong location. Just to the left, if you look down just under the DRS settings, there is an vSphere Availability. Thats is where your HA settings would be.
Thank you for your reply! Yes, I was looking in the wrong place. BUT NOW...
See the attached screen shot: When I go to Cluster > Edit > Configure > Services\vSphere availability, I see vSphere HA is turned ON,
but when I click the "Edit..." button on the far right, the pop-up dialogue box is Blank .
See 2nd attached screen shot.
Has anyone found an actual solution to this? Cycling HA is not a solution, it's another cheap VMware band-aid. We experience this hung/failed HA experience all the time and waiting for HA to remove then re-enable across large clusters is a waste of admin time.
In 6.7 I found that just turning off Host Monitoring then back on in the Edit Cluster Settings dialog (de-select "Enable Host Monitoring", click OK, then go back in to Edit Cluster Settings and re-select "Enable Host Monitoring" and click OK) cleared this message and was quick to execute - didn't need a full HA reconfiguration across cluster.
@vasquezu are your HA events only related to "VMs waiting for a retry".
We are constantly seeing these events for example. "0 VMs being restarted, 10 VMs waiting for a retry, 0 VMs waiting for resources, 0 inaccessible vSAN VMs"
Only resolution is to disable HA and enable. Currently on VC 7.0U3j, VMware want me to be on the latest before investigating further.
There is really no HA event and no VM is ever restarted.
I am getting the same events as you. I have just been diabling HA and reenabling for a good 20 or so months as far as i can remember lol. It would just be nice to find the root cause and fix it.
I have engaged VMware support and that dragged out for a few months and we finally just had them close the SR in Aug.
Thanks, I just gave up clearing the alarm and let sit there and the "VMs waiting for a retry" number just increases and increases.
Not always but sometimes we tend to see an alarm on a VM "vSphere HA virtual machine failover failed". Normally due to event "Insufficient resources to fail over this virtual machine. vSphere HA will retry the failover..."
Not once are these actually related to a HA event or does a VM reboot.
I have cleared my cluster alarms this morning. Going to try a SR with VMware again. Will let you know if I get anywhere.
Same here for our VDI cluster. When VMs get redeployed we see these alarms popping up:
2023-05-11T16:27:44.697231+02:00 <ourvcenter> vpxd 6543 - - Event  [1-1] [2023-05-11T16:27:44.693618+02:00] [vim.event.EventEx] [info]  [DC-xxxxx]  [vSphere HA failover operation in progress in cluster xxxx in datacenter xxxxx: 0 VMs being restarted, 1 VMs waiting for a retry, 0 VMs waiting for resources, 0 inaccessible vSAN VMs]
After a lot of pressing I finally got an update on this issue.
"This is happening due to a rare condition in the VCenter HA service. This is a known issue by engineering, and currently, there is no resolution. The engineering case is 2802103 for your reference. You can clear the alarm by disabling and enabling VCenter HA as a workaround."
The engineering case is open since 17/06/2021 and was first noticed in 6.7 P04 and is still happening in 8.0.
A VM deletion seems to trigger the event.
Unfortunately we are still stuck with disabling and enabling vSphere HA to resolve.