On one of our host clusters (contains 4 hosts) we are getting the following Configuration Issues message: vSphere HA agent for this host has an error: the vSphere HA agent is not reachable from the vCenter server.
What we have tried so far.
I haven’t tried rebooting a host yet because I cannot migrate the machines off it. Possibly something I’ll look at afterhours.
On the vCenter I cannot see any of the host configuration for the affected cluster. But if I connect to the host directly I can configure (I use the vSphere client and/or web client)
Here are the settings on one of the affected hosts. (attachment 1)
Here are the settings we expect (from a working host) (attachment 2)
Now the problem I have is the host doesn’t seem to know about the distributed switch/port group for me to add these back in. It only knows about our main VLAN that it’s currently using. (attachment 3)
Thanks for reading. Any ideas would be much appreciated.
Since the host is marked as not responding, let's first focus on that.
Check the vmkernel.log file for the affected host and search for "hostd detected to be non-responsive" entry. If present, the hostd process is hung and a host restart is something worth trying.
A strong indicator that the hostd is hung after being marked as Not Responding, is if you run the esxcli command and it hangs however localcli works at the same time.
If hosts are showing not responding from vCenter server then Please check if you are able to make connection from vCenter to ESXi and vice versa on port 902.
Try also restarting management agents and then management network. It will not have impact on running machines, so you can try that in office hours.
Unfortunately I cannot retrieve the vmkernel.log files from any of the hosts in the affected cluster. Also tried restarting management agents and management network. I will look to schedule in a reboot of the hosts.