We have a strange issue with vCenter HA. We have set up vCenter HA using 3 ESXi hosts. Each ESXi host has one VM for vCenter running on it, ie. Active on one ESXi host, Passive on one and witness on another. So every Monday Morning between 7:00 - 7:25AM we see that the nodes randomly leave the cluster and join back. vCenter remains unavailable at this time but becomes accessible as soon as the nodes join back. We don't have to reboot/restart any VMs for vCenter and vCenter can be accessed without any manual intervention. We are on ESXi 6.7 and vcenter 6.7.48000.
Any idea where I can look at this? In the past, I had opened a case with Vmware but their answer was unsatisfactory, they suggested that we need to have the active and passive VMs on same ESXi host to avoid this. This solution in my opinion defeats the purpose of redundancy and is not a good one.
I suspect that there could be some process or task which is scheduled to run at 7AM every Monday which is causing the vCenter VMs/ ESXi host to act weird. I would really appreciate if someone could point in the direction to investigate.
Thanks & Regards,
1) When you say "nodes randomly leave the cluster and join back" , do they go into an Isolated state ?
2) Is there any backup job running during the time duration ?
3) What is the service status on the each node when the issue happens ?
4) Is there any failover/Service start or stop that you see on the active or the passive node ?
Check the vcha logs for any failure events under the /var/log/vmware/vcha on the Active and the passive node.