Our 3 node vRealize Automation 8.1 cluster has completely fallen over when we were away during covid lockdowns ;o(
On coming back in I was strangely asked on all three nodes to reset the root password which it said had expired. This I don't think was likely possible as I disabled root password expiration. Nevertheless I changed the passwords and logged in as root. The NSX-V Load Balancer showed all nodes as down and I was unable to hit the login page.
On logging into the systems I was unable to perform any tasks. For example...
vracli status deploy
[ERROR] (401)
vracli status
[ERROR] (401).
/opt/script/deploy.sh
Running check eth0-ip
Running check non-default-hostname
Running check non-default-hostname
Running check node-name
Running check single-aptr
Running check single-aptr
Running check nodes-ready
Running check nodes-ready
Running check nodes-count
make: *** [/opt/health/Makeful:48: nodes-count] Error 1
make: Target 'deploy not remade because of errors.
The above loops forever.
Restarted all the nodes and vIDM multiple times. Verified network and system seem ok. I
systemctl shows
systemd-networkd-wait-online.service loaded failed failed.
systemd-tmpfiles-clean.service loaded failed failed.
system-tmpfiles-setup.service loaded failed failed.
The network on the system appears entirely functional however.
This appears this has fallen over completely. Any ideas out there?