It gets pretty far, doesn't complain about certificates, but has trouble getting started up after the new certificates are applied.
Option[1 to 8]: 2
Do you wish to generate all certificates using configuration file : Option[Y/N] ? : y
Please provide valid SSO and VC privileged user credential to perform certificate operations.
Enter username [Administrator@vsphere.local]:
Enter password:
certool.cfg file exists, Do you wish to reconfigure : Option[Y/N] ? : n
1. Generate Certificate Signing Request(s) and Key(s) for VMCA Root Signing certificate
2. Import custom certificate(s) and key(s) to replace existing VMCA Root Signing certificate
Option [1 or 2]: 2
Please provide valid custom certificate for Root.
File : vcenter.ca.cer
Please provide valid custom key for Root.
File : vmca_issued_key.key
You are going to replace Root Certificate with custom certificate and regenerate all other certificates
Continue operation : Option[Y/N] ? : y
Status : 60% Completed [Replace vpxd-extension Cert...]
2023-07-03T20:45:55.464Z Updating certificate for "com.vmware.vim.eam" extension
2023-07-03T20:45:55.659Z Successfully updated certificate for "com.vmware.vim.eam" extension
2023-07-03T20:45:56.415Z Updating certificate for "com.vmware.rbd" extension
2023-07-03T20:45:56.564Z Successfully updated certificate for "com.vmware.rbd" extension
2023-07-03T20:45:57.479Z Updating certificate for "com.vmware.imagebuilder" extension
Status : 85% Completed [starting services...]
Error while starting services, please see service-control log for more details
In reviewing log files, there are a few things that stand out
2023-07-03T20:53:48.484Z [Thread-15 [] WARN com.vmware.cis.server.util.VpxdClient opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:53:48.485Z [Thread-15 [] WARN com.vmware.cis.server.util.impl.InitPoolTask opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 9
2023-07-03T20:54:08.499Z [Thread-15 [] WARN com.vmware.cis.server.util.VpxdClient opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:54:08.499Z [Thread-15 [] WARN com.vmware.cis.server.util.impl.InitPoolTask opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 10
2023-07-03T20:54:28.523Z [Thread-15 [] WARN com.vmware.cis.server.util.VpxdClient opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:54:28.524Z [Thread-15 [] WARN com.vmware.cis.server.util.impl.InitPoolTask opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 11
2023-07-03T20:54:48.553Z [Thread-15 [] WARN com.vmware.cis.server.util.VpxdClient opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:54:48.554Z [Thread-15 [] WARN com.vmware.cis.server.util.impl.InitPoolTask opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 12
2023-07-03T20:55:08.583Z [Thread-15 [] WARN com.vmware.cis.server.util.VpxdClient opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:55:08.583Z [Thread-15 [] WARN com.vmware.cis.server.util.impl.InitPoolTask opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 13
2023-07-03T20:55:28.610Z [Thread-15 [] WARN com.vmware.cis.server.util.VpxdClient opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:55:28.610Z [Thread-15 [] WARN com.vmware.cis.server.util.impl.InitPoolTask opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 14
2023-07-03T20:55:48.628Z [Thread-15 [] WARN com.vmware.cis.server.util.VpxdClient opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:55:48.629Z [Thread-15 [] WARN com.vmware.cis.server.util.impl.InitPoolTask opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 15
Also seeing
com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): no healthy upstream
at com.vmware.vapi.internal.protocol.client.rpc.http.ApacheHttpUtil.validateHttpResponse(ApacheHttpUtil.java:100) ~[vapi-runtime.jar:?]
at com.vmware.vapi.internal.protocol.client.rpc.http.HttpClient.invoke(HttpClient.java:160) ~[vapi-runtime.jar:?]
at com.vmware.vapi.internal.protocol.client.rpc.http.HttpClient.send(HttpClient.java:172) ~[vapi-runtime.jar:?]
at com.vmware.vapi.internal.protocol.client.msg.json.JsonApiProvider.sendRequest(JsonApiProvider.java:186) ~[vapi-runtime.jar:?]
at com.vmware.vapi.internal.protocol.client.msg.json.JsonApiProvider.invoke(JsonApiProvider.java:539) ~[vapi-runtime.jar:?]
at com.vmware.vapi.internal.bindings.Stub.invoke(Stub.java:241) ~[vapi-runtime.jar:?]
at com.vmware.vapi.internal.bindings.Stub.invokeMethodAsync(Stub.java:191) ~[vapi-runtime.jar:?]
...
...
...
I've attached the entire vpxd-svcs.log for reference.
The service control log doesn't seem to have anything interesting. Are there some other logs I could review?
So close in getting this to work...but something is is preventing vcenter restart after the new CA certs are applied and it isn't clear what it is.
Thanks
Kevin
@flyingrobots_69 hi, did you check vCenter in /var/log/vmware/vmon/vmon-syslog.log?
Also check if there is a DNS record mismatch and DNS connectivity.
Ensure that vCenter certificates are not expired by running the following command line on vCenter VM command-line interface:
root@vcenter [ ~ ]# for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; sudo /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done
On the PSC, compare the local hostname with the name that is stored in MACHINE_SS
/usr/lib/vmware-vmafd/bin/vmafd-cli get-pnid --server-name localhost Output should be similar to following: psc.xxx.eg /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store MACHINE_SS Output should be similar to following: X509v3 Subject Alternative Name: email:email@acme.com, DNS:psc.xxx.eg
Compare the output above. If there is a mismatch, for example DNS:psc.xxx.eg.xxx.eg that was cached on the DNS Server before editing the DNS records, then proceed with the next steps.
8. Ensure that vCenter critical services are up and running:
root@vcenter [ ~ ]# service-control --status --all Running: applmgmt lwsmd vmafdd vmonapi vmware-analytics vmware-certificatemanagement vmware-cm vmware-content-library vmware-eam vmware-perfcharts vmware-postgres-archiver vmware-rhttpproxy vmware-sca vmware-sps vmware-statsmonitor vmware-topologysvc vmware-vapi-endpoint vmware-vmon vmware-vpostgres vmware-vpxd vmware-vpxd-svcs vmware-vsan-health vmware-vsm vsphere-client vsphere-ui Stopped: vmcam vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-pod vmware-rbd-watchdog vmware-updatemgr vmware-vcha vsan-dps
--
Please don't forget to accept this as an accepted solution or give me a KUDO if you find this post useful! Thanks! 🙂
Also should point out that this is a brand new installation. Using the latest v7 VCenter.
I instrumented and changed the certificate-manager code to log the verbose output of service-manager and it seems vmon-cli was the last service to get started. Does vmon-cli produce a log? Anyone know where it lives?
2023-07-04T00:10:03.994Z Done running command
2023-07-04T00:10:03.994Z Running command: ['/sbin/service', 'vmware-vmon', 'start']
2023-07-04T00:10:05.809Z Done running command
2023-07-04T00:10:05.809Z Successfully started service vmware-vmon
2023-07-04T00:10:05.809Z Running command: ['/usr/bin/systemctl', 'unset-environment', 'VMON_PROFILE']
2023-07-04T00:10:05.822Z Done running command
Successfully started service vmware-vmon
2023-07-04T00:10:05.824Z Running command: ['/usr/lib/vmware-vmon/vmon-cli', '--batchstart', 'ALL']
2023-07-04T00:16:50.914Z Done running command
Service-control failed. Error: Failed to start services in profile ALL. RC=2, stderr=Failed to start vpxd services. Error: Service crashed while starting
2023-07-04T00:16:50.970Z ERROR certificate-manager None
@flyingrobots_69 hi, did you check vCenter in /var/log/vmware/vmon/vmon-syslog.log?
Also check if there is a DNS record mismatch and DNS connectivity.
Ensure that vCenter certificates are not expired by running the following command line on vCenter VM command-line interface:
root@vcenter [ ~ ]# for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; sudo /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done
On the PSC, compare the local hostname with the name that is stored in MACHINE_SS
/usr/lib/vmware-vmafd/bin/vmafd-cli get-pnid --server-name localhost Output should be similar to following: psc.xxx.eg /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store MACHINE_SS Output should be similar to following: X509v3 Subject Alternative Name: email:email@acme.com, DNS:psc.xxx.eg
Compare the output above. If there is a mismatch, for example DNS:psc.xxx.eg.xxx.eg that was cached on the DNS Server before editing the DNS records, then proceed with the next steps.
8. Ensure that vCenter critical services are up and running:
root@vcenter [ ~ ]# service-control --status --all Running: applmgmt lwsmd vmafdd vmonapi vmware-analytics vmware-certificatemanagement vmware-cm vmware-content-library vmware-eam vmware-perfcharts vmware-postgres-archiver vmware-rhttpproxy vmware-sca vmware-sps vmware-statsmonitor vmware-topologysvc vmware-vapi-endpoint vmware-vmon vmware-vpostgres vmware-vpxd vmware-vpxd-svcs vmware-vsan-health vmware-vsm vsphere-client vsphere-ui Stopped: vmcam vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-pod vmware-rbd-watchdog vmware-updatemgr vmware-vcha vsan-dps
--
Please don't forget to accept this as an accepted solution or give me a KUDO if you find this post useful! Thanks! 🙂
I ended up modifying /usr/lib/vmware/site-packages/cis/certificateManagerOps.py so I could see the which service was dying. I commented out the exception raising going on if the command failed. I made changes so that the log file would contain the individual steps taken by service-controller. It allowed me to see exactly which server was biting the dust.
I thought the log files were in /var/log/vmware/vpxd-svcs, but I was able to see that it was vpxd instead.
Then I found the following messages:
SSL Exception: Verification parameters:
--> PeerThumbprint: 46:57:EA:13:AD:6E:F3:CF:7F:1F:98:8A:C4:87:7A:2D:15:85:DD:2D
--> ExpectedThumbprint:
--> ExpectedPeerName: vcenter.arilabs.net
--> The remote host certificate has these problems:
-->
--> * path length constraint exceeded)
I realized that the intermediate CA certificate we were using had pathLength set to 0. I increased that value, resigned the vcenter root certificate and now it works great.
So, it was indeed a certificate problem.
Thank you for your help @virtualinca
Kevin