Had a lab vCenter crash and am trying to figure out why.
* If starting vCenter from shell with "service-control --start --all" the process will fail with vPostgres couldn't start.
* If starting vPostgres manually ("service-control --start --vmware-vpostgres") and then starting vCenter ("service-control --start --all") the proccess will fail with vpxd-svcs failed to start.
* I logged into vCenter VCDB and verified administrator account
* I reset vCenter certificates and validated with lsdoctor
* vxpd.log shows "Failed to connect to Authz service" and "Failed to initialize authorizeManager"
Anyone seen something like this?
Well, was able to recover. VMware sent a certificate tool (vCert) which identified some trust issues and registrations which the standard tools didn't address.
Then I found an issue with setting up logging within the tomcat instance. I commented out the "isAccessLogCreated" and "accessLogCleaner" beans from the Tomcat config.
I also had to manually rebuild the vPostgresql certificate store.
I restarted the services and got the core up and running. I got a good snapshot of the VCSA. I attempted to do a VCSA back and it failed continuously. I decided to attempt an upgrade to repair the VCSA. It took about 2 hours, but the upgrade completed from 7.03f to g. I continue to walk the update path all the way to the latest 7.03 release.
I tested the VCSA backup and it ran successfully.
I tested the Tomcat by uncommenting the previously commented out beans. It ran successfully.
In summary, there was corruption at multiple points within the VCSA. The help here and from VMware was able to recover it. Thank you all.
looks like a certificate issue.
Have you checked all certificates with
for store in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list | grep -v TRUSTED_ROOT_CRLS); do echo "[*] Store :" $store; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $store --text | grep -ie "Alias" -ie "Not After";done;
lsdcotor says all good ? after trustfix ?
Upon running the suggested code, all certs are dated 2025 and beyond. There is a BACKUP_STORE cert for __MACHINE_CERT dated December 2022, but I was under the understanding that those are inactive.
LSDOCTOR shows all good.
which Build of vCenter you are running ?
Which way do you have reset the certificates ?
with option 8 ?
If not please do it with option 8
There is something with vPostgres and the certificates. When attempting to start vPostgres on its own, there is a long list of messages about trying to build the root_crl.pem file. It makes many requests to the auth service, but eventually fails.
Anyone know if we can just deploy a new vCenter and have it discover or reregister the existing cluster (vSAN, NSX, etc.)?
If not, I will have to plan a big "new deployment and migration".
1) Remove host from existing cluster
2) Clean host
3) Deploy vCenter to single host
4) Enable vSAN
5) Enable NSX
6) Begin migration of workloads (how without a working vCenter?)
7) Role hosts between clusters
First ist this a streched cluster with witness host or a standard cluster / OSA or ESA ?
vsan can work without the vCenter, so in my opinion its not neccessary to destroy everything.
The importent thing ist to install a new vCenter - do you have local datastores in one of your ESXi host - for example a boot device mit about 200 GB space ? - there you can temporarly deploy a vCenter.
Then follow this
Witch NSX Version do you use ? - the nodes must be redeployed.