dgingeri
Enthusiast
Enthusiast

We keep getting "this virtual machine failed to become vsphere ha protected" on one cluster

I work for a cloud service, and we host many customer private clusters across the country on one 6.5 U2g VCSA. We have one customer cluster, 6 hosts and 60 VMs, that keeps having VMs come up with the error "this virtual machine failed to become vsphere ha protected...".  It's easy to correct, but it has to be done manually, turning off HA and then turning it back on.  It's happened 4 times in the last 3 days.  I'm not even sure at this point if it really is working, as it seems to break every time a VM migrates to a new host, and, no, it is not always the same source or destination host.  So, I can't even tell if HA is working, and I don't want a host failure to be the time when we discover that it really isn't working, and have a high paying customer have 8-10 VMs down until they're manually restarted.  The hosts are all physically identical, same firmware levels, and same version, 6.5U3, of ESXi, with Enterprise+ licensing.  

Is there a more permanent fix for this issue?  I haven't been able to find anything in VMWare's knowledge base other than the fix I'm already doing, which seems to last maybe until the next VM migration, or maybe not at all.  

0 Kudos
9 Replies
scott28tt
VMware Employee
VMware Employee

Moderator: Moved to Availability: HA & FT Discussions

0 Kudos
nachogonzalez
Expert
Expert

Hey, hope you are doing fine

Can you disable HA and re enable it? This solves issues sometimes.
What does fdm.log has to say about this?
Which version of ESXi do you have?

Does the error match this? https://kb.vmware.com/s/article/2020082


0 Kudos
dgingeri
Enthusiast
Enthusiast

Yes, I have done that, 4 times in the last 3 days.  It lasts "fixed" until the next VM migration.

0 Kudos
nachogonzalez
Expert
Expert

Can you share fdm.log to investigate if there is an issue over there?
Also, can you tell a little bit more about HA configuration? How is admission control configured? Are you using any reservation? What datastores you use for heartbeating? does it select automatically?

0 Kudos
Lalegre
Virtuoso
Virtuoso

Hey @dgingeri,

The first thing I consider not right is that you have ESXi hosts with a higher version than vCenter. Alwasy ensure that your vCenter Server is equal or higher in version that your ESXi hosts.

As @nachogonzalez the fdm.log will have the details of your issue and when you check that, take a look at this KB because it applies to your version: https://kb.vmware.com/s/article/66928

0 Kudos
RajeevVCP4
Expert
Expert

Can you provide fdm.log ((/var/log/fdm.log) file with time stamp and vm name too.

 

Rajeev Chauhan
VCIX-DCV6.5/VSAN/VXRAIL
Please mark help full or correct if my answer is use full for you
0 Kudos
daphnissov
Immortal
Immortal

If you work for a cloud provider, then you're probably in the VSPP program. Why don't you open a ticket with GSS instead since this impacts one of your customers?

0 Kudos
andvm
Enthusiast
Enthusiast

6.5 U2g VCSA

6.5 U3 ESXi

Am I not seeing a mismatch here, meaning VCSA version should be the same or higher than ESXi?

0 Kudos
Lalegre
Virtuoso
Virtuoso

6.5 U2g VCSA

6.5 U3 ESXi

VCSA version should be same or higher. You have 6.5 U2g for VCSA and 6.5 U3 for ESXi which is a higher version than VCSA which means VCSA is lower than ESXi.

Please algo get the fdm.log as mentioned previously

0 Kudos