One of my vCenters, VCSA 6.7U3b, has been having issue that it "randomly" stops responding & when it happens
* HTML5 client just hangs there after entering username/password
* SSH to the appliance hangs after entering root password; it does not return with authentication error
* the appliance's console has no error messages
Sometimes, if we wait long enough in HTML5 client, it will error out with: "503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http16LocalServiceSpecE:0x00007f0ac0075b50] _serverNamespace = /websso action = Allow _port = 7080)".
In /var/log/vmware/messages, it has frequent occurrences of:
2020-04-01T18:13:06.145359+00:00 vcenter lsassd: 0x7f3876565700:Failed to find user, group, or domain by name (name = 'user1@DOMAIN.COM', searched host = 'DC.domain.com') -> error = 40098, symbol = LW_ERROR_RPC_OPENPOLICY_FAILED
In the same log file, it sometimes has:
2020-04-01T18:13:06.145809+00:00 vcenter lsassd: 0x7f3876565700:Domain 'domain.com' is now offline
2020-04-01T18:13:06.148560+00:00 vcenter lsassd: 0x7f383ffff700:Detected domain 'domain.com' offline. Some group information from this domain might be missing.
2020-04-01T18:13:06.235264+00:00 vcenter lsassd: 0x7f383ffff700:Domain 'domain.com' is now online
The vCenter is on the same network as the three domain controllers it uses so latency should not cause such issue. I've bumped up the resources 16 vCPUs and 48GB RAM but it still hangs. When it happened, the resources did not look stressed out at the VCSA VM level.
Has anyone had similar issues or any suggestions? Thanks,
Thanks for the link. I already tried that but it still generates same type of log entries constantly. I am not 100% sure whether it's the root cause of VCSA stops responding but it seems to indicate it's authentication related. One thing that's strange to me is when it happens, it also hangs after I've entered the root password in the SSH session to VCSA. I can't imagine root authentication involves SSO too.
Basically I am in the dark as far as when it will hang again. Thanks for your help.
I would suggest to follow below steps if you not follow.
1. Reset VCSA root password
2. Reduce the vCPU to 12 and memory to 32 GB
3. check domain controller is reachable from VCSA
4. If VCSA already in domain please disjoin and delete the compuer account from domain and then re-join to domain.
Is your VCSA a hardware or virtual solution? One issue I've been having is after the posted account expiration date "60 days in our case" the system will lock out the root admin accounts and you have to do password recoveries.
This first happened to me after the first 60 day limit. You would think they would have a last time login available so you can change the password. Our Cyber team came in and reconfigured our cluster to go off of Windows Credentials. Once that was completed I was locked out of my root accounts "Default root account and the Admin account I created for myself.
The Windows Credentials are not using the root accounts. There are four ways to access vCenter/VCSA and the hosts for different reasons and now I'm locked down to just using the vSphere Client. I need root access to be able to turn SSH on for certain tasks, but I no longer have that capability.
I've made my issues known and hope to resolve them.