We have a single on-premises VCSA 6.5 instance that recently ran into the certificate expiration detailed in this KB:
https://kb.vmware.com/s/article/76719
All the certificates have been regenerated using the certificate-tool via the CLI, and now show up as up-to-date using the one-liner in the above KB (they were all previously expired a week ago):
STORE MACHINE_SSL_CERT
Alias : __MACHINE_CERT
Not After : Aug 18 19:56:50 2022 GMT
STORE TRUSTED_ROOTS
Alias : 9bd7b30bcb1dcecfe2491a3e91fcd3dd756f347f
Not After : Aug 1 13:58:01 2028 GMT
Alias : c0af9d76ae9fab214298c6b11d4efb72f64b6c13
Not After : Aug 13 18:18:55 2030 GMT
Alias : ac50bb369ff7dce7e8c372b9b3e50f6e3aaaa528
Not After : Aug 13 18:20:03 2030 GMT
Alias : 3e816060d6322a45114eac30798edbf1a4a1397d
Not After : Aug 13 18:28:26 2030 GMT
Alias : 074ddc83baeea4c6588f3f11837ed4fc77b25220
Not After : Aug 13 19:21:38 2030 GMT
Alias : 4bbaf83d23a818f2e8122b60ca0edc6dabf76d7d
Not After : Aug 13 19:33:49 2030 GMT
STORE TRUSTED_ROOT_CRLS
Alias : a45f284d7b9325005381b1b14d3ac3c823e104c9
Alias : 4b3b32cf9bb0d212aa6551bdd97dd3aaf029dde5
Alias : 02c60981250d68d94e1fcd31c93d0c50ae26d531
Alias : c4df908ec94dc3b1b774ca4a8768acfdbee90e59
Alias : f65b7ab274c5d949e8e914101797260d9e40fd70
Alias : 84d8635a51db3a011bab257873555c6776381d37
STORE machine
Alias : machine
Not After : Aug 18 19:12:42 2022 GMT
STORE vsphere-webclient
Alias : vsphere-webclient
Not After : Aug 18 19:12:43 2022 GMT
STORE vpxd
Alias : vpxd
Not After : Aug 18 19:12:43 2022 GMT
STORE vpxd-extension
Alias : vpxd-extension
Not After : Aug 18 19:12:44 2022 GMT
STORE SMS
Alias : sms_self_signed
Not After : Aug 7 14:06:21 2028 GMT
STORE BACKUP_STORE
Alias : bkp___MACHINE_CERT
Not After : Aug 18 19:11:39 2022 GMT
Alias : bkp_machine
Not After : Aug 18 19:12:42 2022 GMT
Alias : bkp_vsphere-webclient
Not After : Aug 18 19:12:43 2022 GMT
Alias : bkp_vpxd
Not After : Aug 18 19:12:43 2022 GMT
Alias : bkp_vpxd-extension
Not After : Aug 18 19:12:44 2022 GMT
When I try to start all services now, it returns the following after ~5 minutes:
Service-control failed. Error Failed to start vmon services.vmon-cli RC=1, stderr=Failed to start vpxd-svcs, vapi-endpoint services. Error: Operation timed out
When using service-control to start just the vpxd-svcs service by itself, it returns the following error:
Perform start operation. vmon_profile=None, svc_names=['vmware-vpxd-svcs'], include_coreossvcs=False, include_leafossvcs=False
2020-08-18T21:10:50.484Z Service vpxd-svcs state STOPPED
Error executing start on service vpxd-svcs. Details {
"resolution": null,
"detail": [
{
"args": [
"vpxd-svcs"
],
"id": "install.ciscommon.service.failstart",
"localized": "An error occurred while starting service 'vpxd-svcs'",
"translatable": "An error occurred while starting service '%(0)s'"
}
],
"componentKey": null,
"problemId": null
}
Service-control failed. Error {
"resolution": null,
"detail": [
{
"args": [
"vpxd-svcs"
],
"id": "install.ciscommon.service.failstart",
"localized": "An error occurred while starting service 'vpxd-svcs'",
"translatable": "An error occurred while starting service '%(0)s'"
}
],
"componentKey": null,
"problemId": null
}
The web UI returns the following 503 error (which it has been returning since the certs expired):
503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x000056033c080640] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)
Can anyone point me to what log files specifically I need to be looking at to diagnose this and figure out what keeps the service from starting? I've already covered the following:
Our last resort is to simply wipe and reinstall VCSA, but I'd like to avoid it if this is possible to fix.
Just to be sure, you are using self signed certificates, right?
try this:
Use option 4 and follow the steps.
If that fails and roll back navigate to VMware Endpoint certifcate store /usr/lib/vmware-vmafd/bin/dir-cli
generate a new folder (mkdir Certs_backup)
move all the certificates into the new folder and try again with the KB i told you. That should solve the issue.
Also keep in mind that you should reboot the VCSA at the end.
Hey, hope you are doing fine
in my experience this is a certificate error, VMware products will always fail if certificates aren't replaced correcly.
Can you please attach the following logs vpxd.log, (located on /var/log/vmware/)
In addition to that, are you running an embedded PSC or an external PSC?
Have you tried this:
Regenerate a New VMCA Root Certificate and Replace All Certificates
Regenerate Self-Signed Certificate in vSphere 6.5 - VMWare Insight --> Try this first
Hope this works
Thank you for your reply! We run an embedded PSC. I've attached a segment of the vpxd.log; it looks like it still thinks the certs are expired even after regenerating them:
2020-08-18T17:59:12.882Z error vpxd[7FB8023E7700] [Originator@6876 sub=LSClient] Caught exception while creating LS client adapter: N7Vmacore3Ssl18SSLVerifyExceptionE(SSL Exception: Verification parameters:
--> PeerThumbprint: 08:0A:82:91:0D:F4:CC:62:82:27:66:45:69:BD:78:A7:9A:EB:5B:B5
--> ExpectedThumbprint:
--> ExpectedPeerName: 10.83.1.20
--> The remote host certificate has these problems:
-->
--> * certificate has expired)
I'll try to regenerate the self-signed certificates once more in the articles you linked.
After following the steps to regenerate the VMCA Root Certificate in this post:
http://vmwareinsight.com/Articles/2020/1/5802978/Regenerate-Self-Signed-Certificate-in-vSphere-6-5
... it gets stuck at 85% upon restarting the affected services and then rolls back, which sounds very similar to what the poster here is describing:
https://communities.vmware.com/thread/565418
The above post includes the following workaround, but in our case the .buildInfo file permissions are already set to 444, so changing them has no effect:
Custom certificate replacement fails on upgraded vCenter Server Appliance 6.5 Update 1
After you upgrade from vCenter Server Appliance 6.5 to 6.5 Update 1 and try to replace the Machine SSL certificate of vCenter Server Appliance, the operation fails because the vSphere Update Manager service cannot access the /etc/vmware/.buildinfo file as the file permission changed from 444 to 640.
Workaround:
We use a single host without any organizational custom certificate requirements. I'm kind of at a loss since this should be a straightforward procedure.
Found out I can still get into the PSC web portal (not the main VCSA one, which returns the aforementioned 503 error).
It shows all certificates as being valid and current, so replacing the certs did work to some extent.
The error log in the certificate manager has me thinking this is related to the Update Manager:
Service-control failed. Error Failed to start vmon services.vmon-cli RC=2, stderr=Failed to start updatemgr services. Error: Service crashed while starting
2020-08-19T16:28:19.396Z ERROR certificate-manager None
2020-08-19T16:28:19.397Z ERROR certificate-manager Error while starting services, please see log for more details
This matches the Update Manager issue in this KB, but I can't stop the service as I can't log into the VCSA web interface to turn it off:
I have no clue what changed in between the two cert resets, but the VCSA web portal is now working again and the service is starting even though the last Machine cert reset failed and attempted to roll back the changes. I'll write this one up to ghosts. If anyone else runs across the same issue, let me know! I'll mark this as solved.
Just to be sure, you are using self signed certificates, right?
try this:
Use option 4 and follow the steps.
If that fails and roll back navigate to VMware Endpoint certifcate store /usr/lib/vmware-vmafd/bin/dir-cli
generate a new folder (mkdir Certs_backup)
move all the certificates into the new folder and try again with the KB i told you. That should solve the issue.
Also keep in mind that you should reboot the VCSA at the end.
Glad this worked
Hi Nacho,
we have the same problem here but I don't understand your solution.
What means
navigate to VMware Endpoint certifcate store /usr/lib/vmware-vmafd/bin/dir-cli
?
"/usr/lib/vmware-vmafd/bin/dir-cli" isn't a directory where I can generate a new directory. Or did you meant that I have to use this script to move the certificates? Can you explain it a little more detail?
Big thanks and kind regards
Thomas
Hey, hope you are doing fine.
Are you using a Windows based vCenter or VCSA (linux based)?
Hi, ya hope you are doing fine too.
We are using the linux based VCSA with version 6.5.0.22000.
If you log in with root user via ssh
are you able to do this?
cd /usr/lib/vmware-vmafd/bin/dir-cli
No, because it isn't a directory on our side:
root@esxvcenter01 [ ~ ]# cd /usr/lib/vmware-vmafd/bin/dir-cli
bash: cd: /usr/lib/vmware-vmafd/bin/dir-cli: Not a directory
Hey, i looked at some notes
dir-cli is a certificate management tool that will help you regenerate solution user's certificates
This will help youhttps://www.settlersoman.com/how-to-publish-root-ca-into-the-trusted-store-in-vmware-endpoint-certif...
I tried this KB first.
https://kb.vmware.com/s/article/76719
Use this help to copy the file to vcenter.
https://techbrainblog.com/2015/03/30/how-to-scp-files-to-vmware-vcenter-appliance-6-0-vcsa/
But It stucked with the error "Failed to start vpxd-svcs, vapi-endpoint services. Error: Operation timed out"
then try your suggestion, now vcenter can be accessed now.
I'm pretty sure that this issue came from STS certificate expired.
Hi bnahuy,
What suggestion specifically did you try?
I have regenerated the cert as well but still failing at starting services, same as original poster indicated
Did you get anywhere with this? I'm getting the same error message.