VMware Cloud Community
jenner43201
Contributor
Contributor
Jump to solution

Unable to start Windows VCenter 6.5 U3v Secure Token Service

I have a VCenter 6.5 U3v running on Windows 2012.  I just updated the VCenter to U3v a couple of weeks ago.  The VCenter had been operating normally since then.

I found out the VCenter instance stopped functioning a couple of days ago.  I attempted VCenter restarts and Windows reboots to resolve the issue to no avail.  Looking at the command line, I see the following when trying to start the STS service:

2024-03-28T15:27:14.384Z   ERROR Starting service: VMwareSTS, Exception: (1053, 'StartService', 'The service did not respond to the start or control request in a timely fashion.')
Error executing start on service VMwareSTS. Details {
    "resolution": null,
    "detail": [
        {
            "args": [
                "VMwareSTS"
            ],
            "id": "install.ciscommon.service.failstart",
            "localized": "An error occurred while starting service 'VMwareSTS'",
            "translatable": "An error occurred while starting service '%(0)s'"
        }
    ],
    "componentKey": null,
    "problemId": null
}
Service-control failed. Error {
    "resolution": null,
    "detail": [
        {
            "args": [
                "VMwareSTS"
            ],
            "id": "install.ciscommon.service.failstart",
            "localized": "An error occurred while starting service 'VMwareSTS'",
            "translatable": "An error occurred while starting service '%(0)s'"
        }
    ],
    "componentKey": null,
    "problemId": null
}

First, I understand the appliance may be a better solution but I want to keep this on Windows.  Secondly, the "error 1053" is an immediate response as opposed to waiting 30s.  I presume this means that it is a default error code thrown by STS and that maybe something is interfering with STS service.  The vxpd, vmafdd, vMon log files are not advancing so I don't see any information about the error.  What log file should I check to look for more information? 

I haven't performed a reinstall as the 6.5U3v Installer wants to uninstall vCenter first.  I don't have any data backups of the vCenter.  I could try to rebuild the data but it would be painful.

What other ideas should I try?

0 Kudos
1 Solution

Accepted Solutions
jenner43201
Contributor
Contributor
Jump to solution

I ended up uninstalling and re-installing.  There may have been certificate issues with the STS service.  At least all my certs are valid for another 2 years so I'm OK for now.

I encountered several errors such that I needed to clean the system with uninstalls and start over.  I used several articles including https://communities.vmware.com/t5/VMware-vCenter-Discussions/Can-t-Uninstall-vCenter-Server/m-p/3011...  and https://communities.vmware.com/t5/vCenter-Server-Discussions/Vcenter-Server-6-5d-Installation-error-....  I may have referenced several other articles too.

Couple of other references:

https://blogs.vmware.com/professional-services/2023/02/how-to-renew-an-expired-vmware-vcenter-servic...

- Comment in this blog was useful b/c it mentions LEAF certs can expire and lead to problems:   https://luchodelorenzi.com/2020/05/28/proactively-checking-and-replacing-sts-certificate-on-vsphere-...

 

View solution in original post

0 Kudos
4 Replies
Brisk
Enthusiast
Enthusiast
Jump to solution

Let me first state that you're running an ancient version of vSphere, on top of a Windows Server version that is out of support. I highly recommend you upgrade to a version of vSphere that is still supported and switch to the appliance. You're exposing yourself to unneeded security risks here.

Now that we have that out of the way, given the age of your environment, could you check if your certificates are still valid? It could be that one of the vCenter certs has expired and that's causing the service to fail.

0 Kudos
jenner43201
Contributor
Contributor
Jump to solution

I had just replaced the certificates on Oct 11, 2023.  These are self-signed certificates as opposed to certs from a Cert-Authority.  AFAIK, the certificates are valid until 2025.  I ran a powershell script to check certificate expirations and I don't see any that have expired.

0 Kudos
jenner43201
Contributor
Contributor
Jump to solution

Some background info and what I've done so far:

Background:

VCenter 6.5 is stand-alone on a Windows machine managing 10 nodes and 50 VMs.  The VM hosts are running either ESXi 5.5 or 6.5.  Others manage those hosts on obsolete hardware so I don't have authority to upgrade those ESXis. With that out of the way here's what I've done recently.  I updated the ROOT CA back on Oct 11, 2023 by using the "fixsts.py" script as mentioned here: https://kb.vmware.com/s/article/76719.   On March 13, I upgraded VCenter 6.5U3p to 6.5U3v to fix a log4j issue.  The VCenter was operational after a server reboot.

Investigation and Debugging

Only recently (March 23-ish?) did the VCenter go down.  I looked through multiple articles but haven't resolved my issue.

  • Log files aren't indicating certificate issues.  In fact, I can't find a log file to indicate my problem.  I have an excerpt of vmafdd.log below.
  • With STS service down the rest of VCenter won't operate.
  • I used the procedure to check VMCA certificates.   According to the PowerShell script, all certificates are valid.
  • I tried to use "checksts" from https://kb.vmware.com/s/article/79248.   Since the STS service isn't active, the "checksts" wasn't able to perform the URL lookup and get results. ☠️
  • I tried to perform a postgres DB backup.  Since the Vcenter and DB service isn't operational, the DB backup didn't work.  I got a 0 byte file. ☠️
  • This blog's comment indicated a problem where powershell would indicate valid certs but checksts.py (if it worked) could reveal Leaf Certs that expired (https://luchodelorenzi.com/2020/05/28/proactively-checking-and-replacing-sts-certificate-on-vsphere-...).  I used the procedure to generate a new STS Signing certificate (https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.psc.doc/GUID-F9A0CA06-8875-4A66-BBBA-DB0C01...).  But I couldn't install it due to VCenter is down.
  • This VMWare blog post had a discussion on renewing certificates and also using the "lsdoctor.py" tool (https://blogs.vmware.com/professional-services/2023/02/how-to-renew-an-expired-vmware-vcenter-servic...).   I got python exceptions and couldn't get this tool to work either.
  • Here is an excerpt from vmafdd.log.  The contents show errors.  I had these errors back in Feb 2024 even while VCenter was operational.
  • 2024-04-02T00:46:28.246Z:t@24484:ERROR: [Error - 3, ..\vecsserviceapi.c:1507]
    2024-04-02T00:57:08.057Z:t@22824:INFO: vmafdd: stop
    2024-04-02T00:57:14.423Z:t@14260:ERROR: [Error - 183, ..\vecsserviceapi.c:189]
    2024-04-02T00:57:14.424Z:t@14260:ERROR: [Error - 183, ..\authservice.c:36]
    2024-04-02T00:57:14.429Z:t@14260:ERROR: [Error - 183, ..\vecsserviceapi.c:189]
    2024-04-02T00:57:14.430Z:t@14260:ERROR: [Error - 183, ..\authservice.c:36]
    2024-04-02T00:57:14.435Z:t@14260:ERROR: [Error - 183, ..\vecsserviceapi.c:189]
    2024-04-02T00:57:14.437Z:t@14260:ERROR: [Error - 183, ..\authservice.c:36]
    2024-04-02T00:57:15.625Z:t@21892:INFO: VmAfdRpcServerCheckAccess: request from ncalrpc:[62400]
    2024-04-02T00:57:15.627Z:t@14260:INFO: RPC service status (listening)
    2024-04-02T00:57:15.629Z:t@14260:INFO: Registry key value for Super Logging: 0
    2024-04-02T00:57:15.630Z:t@14260:INFO: Super Logger object is created.
    2024-04-02T00:57:15.632Z:t@14260:INFO: Starting Roots Fetch Thread, VmAfdInitCertificateThread
    2024-04-02T00:57:15.692Z:t@14260:INFO: Started Roots Fetch Thread successfully, VmAfdInitCertificateThread
    2024-04-02T00:57:15.695Z:t@14260:INFO: Starting Pass Refresh Thread, VmAfdInitPassRefreshThread
    2024-04-02T00:57:15.697Z:t@14260:INFO: Started Pass Refresh Thread successfully, VmAfdInitPassRefreshThread
    2024-04-02T00:57:15.699Z:t@14260:INFO: Starting the CDC State machine, CdcInitStateMachine
    2024-04-02T00:57:15.701Z:t@14260:INFO: Started CDC State Machine Thread successfully, CdcInitStateMachine
    2024-04-02T00:57:15.703Z:t@14260:INFO: Starting CDC Caching Thread, CdcInitCdcCacheUpdate
    2024-04-02T00:57:15.705Z:t@14260:INFO: Started CDC Cache Thread successfully, CdcInitCdcCacheUpdate
    2024-04-02T00:57:15.707Z:t@14260:INFO: vmafdd: started!
    2024-04-02T00:57:18.754Z:t@21648:ERROR: [Error - 9127, ..\ldap.c:170]
    2024-04-02T00:57:18.755Z:t@21648:ERROR: [Error - 9127, ..\rootfetch.c:256]
    2024-04-02T00:57:18.757Z:t@21648:INFO: Failed to update trusted roots. Error [9127]
    2024-04-02T00:58:18.057Z:t@21648:ERROR: [Error - 4312, ..\rootfetch.c:684]
    2024-04-02T00:58:18.119Z:t@21648:INFO: Added cert to VECS DB: 460340545a790dc8d822dbf6ba54623612dbe644
    2024-04-02T00:58:18.200Z:t@21648:INFO: VecsSrvDeleteCertificate: Deleted cert (alias 5c4bb762bcc068cd13ce0f9e8e8c37b68f7f717f) from store 3
    2024-04-02T00:58:18.204Z:t@21648:INFO: VecsDeleteFileWithRetry: successfully deleted cert file: D:\ProgramData\VMware\vCenterServer\cfg\certs\85db7385.r0
    2024-04-02T00:58:18.212Z:t@21648:INFO: VecsFillVacantFileSlot: copied D:\ProgramData\VMware\vCenterServer\cfg\certs\85db7385.r1 to D:\ProgramData\VMware\vCenterServer\cfg\certs\85db7385.r0
    2024-04-02T00:58:18.216Z:t@21648:INFO: VecsDeleteFileWithRetry: successfully deleted cert file: D:\ProgramData\VMware\vCenterServer\cfg\certs\85db7385.r1
    2024-04-02T00:58:18.222Z:t@21648:INFO: VecsDeleteFileWithRetry: successfully deleted cert file: D:\ProgramData\VMware\vCenterServer\cfg\certs\899b2435.r0
    2024-04-02T00:58:18.227Z:t@21648:INFO: VecsFillVacantFileSlot: copied D:\ProgramData\VMware\vCenterServer\cfg\certs\899b2435.r1 to D:\ProgramData\VMware\vCenterServer\cfg\certs\899b2435.r0
    2024-04-02T00:58:18.231Z:t@21648:INFO: VecsDeleteFileWithRetry: successfully deleted cert file: D:\ProgramData\VMware\vCenterServer\cfg\certs\899b2435.r1
    2024-04-02T00:58:18.237Z:t@21648:INFO: VecsDeleteFileWithRetry: successfully deleted cert file: D:\ProgramData\VMware\vCenterServer\cfg\vmware-vpx\docRoot\certs\899b2435.r1
    2024-04-02T00:58:18.241Z:t@21648:ERROR: [Error - 3, ..\vecsserviceapi.c:1507]
  • I don't know if I can create/install new certificates based on VMware articles when I did it last Oct 2023.  The STS service is down, so the VMCA service is inactive to provide a certificate authority.

 

Anyone have any ideas?

0 Kudos
jenner43201
Contributor
Contributor
Jump to solution

I ended up uninstalling and re-installing.  There may have been certificate issues with the STS service.  At least all my certs are valid for another 2 years so I'm OK for now.

I encountered several errors such that I needed to clean the system with uninstalls and start over.  I used several articles including https://communities.vmware.com/t5/VMware-vCenter-Discussions/Can-t-Uninstall-vCenter-Server/m-p/3011...  and https://communities.vmware.com/t5/vCenter-Server-Discussions/Vcenter-Server-6-5d-Installation-error-....  I may have referenced several other articles too.

Couple of other references:

https://blogs.vmware.com/professional-services/2023/02/how-to-renew-an-expired-vmware-vcenter-servic...

- Comment in this blog was useful b/c it mentions LEAF certs can expire and lead to problems:   https://luchodelorenzi.com/2020/05/28/proactively-checking-and-replacing-sts-certificate-on-vsphere-...

 

0 Kudos