Please sync ntp on ESXi hosts where all the VCSA are residing along with VCSA ntp..
If there is a time sync issue, you will have this issue
check /var/log/vmware/messages.log for timesync issues if any to confirm
Many thanks for your reply
I did initially have ntp issues but these have been resolved now but the problem still persists - but I will take a look at the messages.log as you suggested.
I've been monitoring this quite closely and noticed the occasional alert for the "PSC Health Service" pop up. It only appeared a couple of times within a 2-3 hour period and disappeared within a few minutes each time. When I check the service in the VAMI it is always showng has healthy. I'm not sure if this is what was causing the VCSA to then initiate a failover but these alerts occurred only 2-3 times within three hours whereas the VCSA was failing over about every 15mins (when I had it enabled).
What log file should I be looking at to get more details on the PSC health alert (changing from green to red and then back to green) and what's causing it?
Is there a config file that details what service failure will initiate a VCHA failover - perhaps I can edit this to remove services until I found out the service failure responsible for causing the VCSA HA to failover?
PSc health service alarms also appears due to ntp sync issues. you can check /var/core which might have coredumps..
Check ESXi hosts along with VCSA ..
Edit VCSA -> options -> sync with hosts (enable/disable) --- > if VCSA is not joined to domain, please validate this as if VCSA migrates to other hosts where time sync is not proper.. VCHA will failover
Thanks for posting in VMware communities.
Can you validate if there are any backup software configured to take backup on the VCHA nodes?
If yes, you may need to stop it as taking snapshot is not supported for nodes in VCHA and this will cause sync issues between the nodes.
ok - I will check that too.
How far out does the time need to be to cause an issue.
Everything currently synch's to the same ntp source so there is no drift - the VCSA is set to an NTP server rather than synch from host. and all the hosts are set to the same ntp source as the VCSA
There may be a small drift (less than 1 second) - could that small amount still cause this?
No.. I think 2-3 minutes should be fine .. anything beyond that is an issue.. please check the esxi hosts ntp as well
There is currently no backup software in place yet - but thanks for the heads up.
ok - I'll double check this when I can and post an update.
Moderator: Moved to vCenter Server Appliance
I checked all the ntp settings on the VCSA and the hosts and all point towards the same source ntp server. There is no difference in time between any of the hosts and the VCSA.
I've raised a ticket with VMware who are checking through logs so hopefully will get a little more clarity soon as to what is causing the VCSA to decide a failover is necessary.
I could be wrong, and this may not be related, but there has been an issue with Dynamic DNS registrations since U3 (fixed in the current U3b build), where not only the management IP was registered, but also the HA addresses which caused issues. see https://kb.vmware.com/s/article/76406