VMware Cloud Community
vadm168
Enthusiast
Enthusiast

Random hung VCSA 6.7U3b

Hi,

One of my vCenters, VCSA 6.7U3b, has been having issue that it "randomly" stops responding & when it happens

* HTML5 client just hangs there after entering username/password

* SSH to the appliance hangs after entering root password; it does not return with authentication error

* the appliance's console has no error messages

Sometimes, if we wait long enough in HTML5 client, it will error out with: "503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http16LocalServiceSpecE:0x00007f0ac0075b50] _serverNamespace = /websso action = Allow _port = 7080)".

In /var/log/vmware/messages, it has frequent occurrences of:

2020-04-01T18:13:06.145359+00:00 vcenter lsassd[8349]: 0x7f3876565700:Failed to find user, group, or domain by name (name = 'user1@DOMAIN.COM', searched host = 'DC.domain.com') -> error = 40098, symbol = LW_ERROR_RPC_OPENPOLICY_FAILED

In the same log file, it sometimes has:

2020-04-01T18:13:06.145809+00:00 vcenter lsassd[8349]: 0x7f3876565700:Domain 'domain.com' is now offline

2020-04-01T18:13:06.148560+00:00 vcenter lsassd[8349]: 0x7f383ffff700:Detected domain 'domain.com' offline. Some group information from this domain might be missing.

2020-04-01T18:13:06.235264+00:00 vcenter lsassd[8349]: 0x7f383ffff700:Domain 'domain.com' is now online

The vCenter is on the same network as the three domain controllers it uses so latency should not cause such issue. I've bumped up the resources 16 vCPUs and 48GB RAM but it still hangs. When it happened, the resources did not look stressed out at the VCSA VM level.

Has anyone had similar issues or any suggestions? Thanks,

Reply
0 Kudos
4 Replies
Nawals
Expert
Expert

This VMware  KB VMware Knowledge Base  may help you.

NKS Please Mark Helpful/correct if my answer resolve your query.
Reply
0 Kudos
vadm168
Enthusiast
Enthusiast

Hi Nawals,

Thanks for the link. I already tried that but it still generates same type of log entries constantly. I am not 100% sure whether it's the root cause of VCSA stops responding but it seems to indicate it's authentication related. One thing that's strange to me is when it happens, it also hangs after I've entered the root password in the SSH session to VCSA. I can't imagine root authentication involves SSO too.

Basically I am in the dark as far as when it will hang again. Thanks for your help.

Reply
0 Kudos
Nawals
Expert
Expert

I would suggest to follow below steps if you not follow.

1. Reset VCSA root password

2. Reduce the vCPU to 12 and memory to 32 GB

3. check domain controller is reachable from VCSA

4.  If VCSA already in domain please disjoin and delete the compuer account from domain and then re-join to domain.

NKS Please Mark Helpful/correct if my answer resolve your query.
Reply
0 Kudos
Dthompson04
Contributor
Contributor

Is your VCSA a hardware or virtual solution?  One issue I've been having is after the posted account expiration date "60 days in our case" the system will lock out the root admin accounts and you have to do password recoveries.

This first happened to me after the first 60 day limit.  You would think they would have a last time login available so you can change the password.  Our Cyber team came in and reconfigured our cluster to go off of Windows Credentials.  Once that was completed I was locked out of my root accounts "Default root account and the Admin account I created for myself.

The Windows Credentials are not using the root accounts.  There are four ways to access vCenter/VCSA and the hosts for different reasons and now I'm locked down to just using the vSphere Client.  I need root access to be able to turn SSH on for certain tasks, but I no longer have that capability.

I've made my issues known and hope to resolve them.

Reply
0 Kudos