VMware Cloud Community
aj800
Enthusiast
Enthusiast

Insufficient vSphere HA failover resources vSphere 7

We're running vCenter 7.0U3j now with 5 ESXi hosts in a single cluster using 4 shared datastores for HA heartbeats, and after I patched to this version - actually, WHILE I was patching it using the VAMI, I was monitoring the console on the host the VCSA is running on and suddenly, it vanished.  It was not listed in Virtual Machines on the host.  I went through each host to find it since it was still pinging the IP, and low and behold, it was on another host.  I was able to log back into it to watch it complete the update.  Once the services came back up for the vSphere Client, I logged in and there were messages on the cluster level about HA resources:

vSphere HA initiated a failover action on 25 virtual machines in cluster [our-cluster-name] in datacenter [our-datacenter-name]
 

Insufficient resources to satisfy vSphere HA failover level on cluster [our-cluster-name] in [our-datacenter-name]

Something happened, but nothing noticeable to users, and none of the hosts actually went down, and there's still a running VM remaining on the original host the VCSA started on.  I clear whatever alerts I could, but these two are in blue and there are no options to reset to green or Acknowledge them, so it looks like a configuration issue or an anomaly.

Anyone have a clue what might've happened here?  What logs can I check for this to see what happened?

0 Kudos
1 Reply
aj800
Enthusiast
Enthusiast

Update:  After checking logs on the master (/var/log/fdm.log), there are currently a number of message that say "4 hosts excluded due to component failures".  There is one VM on each of the 4 other hosts (the ones that are NOT the master) that also display issues:

failed placement with fault [N3Vim5Fault13RuleViolationE:0x0000001043b59cf0]

What does this mean?

Update 2: This seems to be an issue following updating the vCenter Appliance to version U3j, per this link I've used before:

https://kb.vmware.com/s/article/84137

I've seen a similar issue in the past where I've had to reinstall the HA agent on the hosts after an update to the VCSA.  This time, the version of the HA agents (vmware-fdm) is 7.0.3-20990078 which seems to match the vCenter version (U3j).  I tried to reinstall the agent on a host in maintenance mode and then restart HA on the cluster once I brought it back out of maintenance mode, but the HA Agents are still showing as unreachable (insufficient HA failover resources).  Only one of the hosts (the master) shows up in HA monitoring and the other 4 are still unreachable.

0 Kudos