VMware Cloud Community
flyingrobots_69
Contributor
Contributor
Jump to solution

Rollback after replacing VMCA Root Certificate (option 2 of certificate-manager)

It gets pretty far, doesn't complain about certificates, but has trouble getting started up after the new certificates are applied.

 

Option[1 to 8]: 2
Do you wish to generate all certificates using configuration file : Option[Y/N] ? : y

Please provide valid SSO and VC privileged user credential to perform certificate operations.
Enter username [Administrator@vsphere.local]:
Enter password:
certool.cfg file exists, Do you wish to reconfigure : Option[Y/N] ? : n
1. Generate Certificate Signing Request(s) and Key(s) for VMCA Root Signing certificate

2. Import custom certificate(s) and key(s) to replace existing VMCA Root Signing certificate

Option [1 or 2]: 2

Please provide valid custom certificate for Root.
File : vcenter.ca.cer

Please provide valid custom key for Root.
File : vmca_issued_key.key

You are going to replace Root Certificate with custom certificate and regenerate all other certificates
Continue operation : Option[Y/N] ? : y
Status : 60% Completed [Replace vpxd-extension Cert...]
2023-07-03T20:45:55.464Z Updating certificate for "com.vmware.vim.eam" extension
2023-07-03T20:45:55.659Z Successfully updated certificate for "com.vmware.vim.eam" extension


2023-07-03T20:45:56.415Z Updating certificate for "com.vmware.rbd" extension
2023-07-03T20:45:56.564Z Successfully updated certificate for "com.vmware.rbd" extension


2023-07-03T20:45:57.479Z Updating certificate for "com.vmware.imagebuilder" extension

Status : 85% Completed [starting services...]
Error while starting services, please see service-control log for more details

 

In reviewing log files, there are a few things that stand out

2023-07-03T20:53:48.484Z [Thread-15 [] WARN  com.vmware.cis.server.util.VpxdClient  opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:53:48.485Z [Thread-15 [] WARN  com.vmware.cis.server.util.impl.InitPoolTask  opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 9
2023-07-03T20:54:08.499Z [Thread-15 [] WARN  com.vmware.cis.server.util.VpxdClient  opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:54:08.499Z [Thread-15 [] WARN  com.vmware.cis.server.util.impl.InitPoolTask  opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 10
2023-07-03T20:54:28.523Z [Thread-15 [] WARN  com.vmware.cis.server.util.VpxdClient  opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:54:28.524Z [Thread-15 [] WARN  com.vmware.cis.server.util.impl.InitPoolTask  opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 11
2023-07-03T20:54:48.553Z [Thread-15 [] WARN  com.vmware.cis.server.util.VpxdClient  opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:54:48.554Z [Thread-15 [] WARN  com.vmware.cis.server.util.impl.InitPoolTask  opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 12
2023-07-03T20:55:08.583Z [Thread-15 [] WARN  com.vmware.cis.server.util.VpxdClient  opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:55:08.583Z [Thread-15 [] WARN  com.vmware.cis.server.util.impl.InitPoolTask  opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 13
2023-07-03T20:55:28.610Z [Thread-15 [] WARN  com.vmware.cis.server.util.VpxdClient  opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:55:28.610Z [Thread-15 [] WARN  com.vmware.cis.server.util.impl.InitPoolTask  opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 14
2023-07-03T20:55:48.628Z [Thread-15 [] WARN  com.vmware.cis.server.util.VpxdClient  opId=] Cannot handle exception during retry: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:8085 invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to localhost:8085 [localhost/127.0.0.1] failed: Connection refused (Connection refused)"
2023-07-03T20:55:48.629Z [Thread-15 [] WARN  com.vmware.cis.server.util.impl.InitPoolTask  opId=] Init pool encountered exception: com.vmware.cis.server.util.exception.VpxdClientException at attempt 15

 

Also seeing 

com.vmware.vapi.client.exception.TransportProtocolException: HTTP response with status code 503 (enable debug logging for details): no healthy upstream
	at com.vmware.vapi.internal.protocol.client.rpc.http.ApacheHttpUtil.validateHttpResponse(ApacheHttpUtil.java:100) ~[vapi-runtime.jar:?]
	at com.vmware.vapi.internal.protocol.client.rpc.http.HttpClient.invoke(HttpClient.java:160) ~[vapi-runtime.jar:?]
	at com.vmware.vapi.internal.protocol.client.rpc.http.HttpClient.send(HttpClient.java:172) ~[vapi-runtime.jar:?]
	at com.vmware.vapi.internal.protocol.client.msg.json.JsonApiProvider.sendRequest(JsonApiProvider.java:186) ~[vapi-runtime.jar:?]
	at com.vmware.vapi.internal.protocol.client.msg.json.JsonApiProvider.invoke(JsonApiProvider.java:539) ~[vapi-runtime.jar:?]
	at com.vmware.vapi.internal.bindings.Stub.invoke(Stub.java:241) ~[vapi-runtime.jar:?]
	at com.vmware.vapi.internal.bindings.Stub.invokeMethodAsync(Stub.java:191) ~[vapi-runtime.jar:?]
...
...
...

I've attached the entire vpxd-svcs.log for reference.

The service control log doesn't seem to have anything interesting.  Are there some other logs I could review?

So close in getting this to work...but something is is preventing vcenter restart after the new CA certs are applied and it isn't clear what it is.


Thanks

Kevin

 

Labels (4)
0 Kudos
1 Solution

Accepted Solutions
virtualinca
Enthusiast
Enthusiast
Jump to solution

@flyingrobots_69  hi, did you check vCenter in /var/log/vmware/vmon/vmon-syslog.log?

Also check if there is a DNS record mismatch and DNS connectivity. 

Ensure that vCenter certificates are not expired by running the following command line on vCenter VM command-line interface:

root@vcenter [ ~ ]# for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; sudo /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done

On the PSC, compare the local hostname with the name that is stored in MACHINE_SS

 /usr/lib/vmware-vmafd/bin/vmafd-cli get-pnid --server-name localhost
Output should be similar to following:
psc.xxx.eg

 /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store MACHINE_SS
Output should be similar to following:
X509v3 Subject Alternative Name:
                email:email@acme.com, DNS:psc.xxx.eg

Compare the output above. If there is a mismatch, for example DNS:psc.xxx.eg.xxx.eg that was cached on the DNS Server before editing the DNS records, then proceed with the next steps.

  • SSH to VCSA VM and initiate certificate-manager by running following command
    • root@psc [ ~ ]# /usr/lib/vmware-vmca/bin/certificate-manager
  • Use option 8 -> 8. Reset all Certificates.
  • Follow this procedure:
    • Confirm “Do you wish to generate all certificates using configuration file: Option[Y/N] ?"
    • Enter credentials
    • Enter values
    • Leave "IPAddress" field empty
    • Enter FQDN of PSC into "Hostname"
    • VMCA "Name" field is name of new Root CA being created (e.g. "VxRail CA")
    • Confirm "Continue operation: Option[Y/N] ?"
    • Confirm "Continue operation : Option[Y/N] ?"
  • Restart all services on both PSC and vCenter
    • service-control --stop --all
    • service-control --start --all


8. Ensure that vCenter critical services are up and running:
 

root@vcenter [ ~ ]# service-control --status --all
Running:
 applmgmt lwsmd vmafdd vmonapi vmware-analytics vmware-certificatemanagement vmware-cm vmware-content-library vmware-eam vmware-perfcharts vmware-postgres-archiver vmware-rhttpproxy vmware-sca vmware-sps vmware-statsmonitor vmware-topologysvc vmware-vapi-endpoint vmware-vmon vmware-vpostgres vmware-vpxd vmware-vpxd-svcs vmware-vsan-health vmware-vsm vsphere-client vsphere-ui
Stopped:
 vmcam vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-pod vmware-rbd-watchdog vmware-updatemgr vmware-vcha vsan-dps

--
Please don't forget to accept this as an accepted solution or give me a KUDO if you find this post useful! Thanks! 🙂

 

 

 

Senior Engineer HCI@DellEMC | vExpert ️| VCP-DCV | vSAN Specialist | VxRail and VMware Data Center Virtualisation Implementor | VxRail and VMware Data Center Virtualisation Administrator | Owner of virtualinca.com |

View solution in original post

0 Kudos
4 Replies
flyingrobots_69
Contributor
Contributor
Jump to solution

Also should point out that this is a brand new installation.  Using the latest v7 VCenter.

 

Tags (1)
0 Kudos
flyingrobots_69
Contributor
Contributor
Jump to solution

I instrumented and changed the certificate-manager code to log the verbose output of service-manager and it seems vmon-cli was the last service to get started.  Does vmon-cli produce a log?  Anyone know where it lives?

 

2023-07-04T00:10:03.994Z  Done running command

2023-07-04T00:10:03.994Z  Running command: ['/sbin/service', 'vmware-vmon', 'start']

2023-07-04T00:10:05.809Z  Done running command

2023-07-04T00:10:05.809Z  Successfully started service vmware-vmon

2023-07-04T00:10:05.809Z  Running command: ['/usr/bin/systemctl', 'unset-environment', 'VMON_PROFILE']

2023-07-04T00:10:05.822Z  Done running command

Successfully started service vmware-vmon

2023-07-04T00:10:05.824Z  Running command: ['/usr/lib/vmware-vmon/vmon-cli', '--batchstart', 'ALL']

2023-07-04T00:16:50.914Z  Done running command

Service-control failed. Error: Failed to start services in profile ALL. RC=2, stderr=Failed to start vpxd services. Error: Service crashed while starting



2023-07-04T00:16:50.970Z ERROR certificate-manager None

 

0 Kudos
virtualinca
Enthusiast
Enthusiast
Jump to solution

@flyingrobots_69  hi, did you check vCenter in /var/log/vmware/vmon/vmon-syslog.log?

Also check if there is a DNS record mismatch and DNS connectivity. 

Ensure that vCenter certificates are not expired by running the following command line on vCenter VM command-line interface:

root@vcenter [ ~ ]# for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; sudo /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done

On the PSC, compare the local hostname with the name that is stored in MACHINE_SS

 /usr/lib/vmware-vmafd/bin/vmafd-cli get-pnid --server-name localhost
Output should be similar to following:
psc.xxx.eg

 /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store MACHINE_SS
Output should be similar to following:
X509v3 Subject Alternative Name:
                email:email@acme.com, DNS:psc.xxx.eg

Compare the output above. If there is a mismatch, for example DNS:psc.xxx.eg.xxx.eg that was cached on the DNS Server before editing the DNS records, then proceed with the next steps.

  • SSH to VCSA VM and initiate certificate-manager by running following command
    • root@psc [ ~ ]# /usr/lib/vmware-vmca/bin/certificate-manager
  • Use option 8 -> 8. Reset all Certificates.
  • Follow this procedure:
    • Confirm “Do you wish to generate all certificates using configuration file: Option[Y/N] ?"
    • Enter credentials
    • Enter values
    • Leave "IPAddress" field empty
    • Enter FQDN of PSC into "Hostname"
    • VMCA "Name" field is name of new Root CA being created (e.g. "VxRail CA")
    • Confirm "Continue operation: Option[Y/N] ?"
    • Confirm "Continue operation : Option[Y/N] ?"
  • Restart all services on both PSC and vCenter
    • service-control --stop --all
    • service-control --start --all


8. Ensure that vCenter critical services are up and running:
 

root@vcenter [ ~ ]# service-control --status --all
Running:
 applmgmt lwsmd vmafdd vmonapi vmware-analytics vmware-certificatemanagement vmware-cm vmware-content-library vmware-eam vmware-perfcharts vmware-postgres-archiver vmware-rhttpproxy vmware-sca vmware-sps vmware-statsmonitor vmware-topologysvc vmware-vapi-endpoint vmware-vmon vmware-vpostgres vmware-vpxd vmware-vpxd-svcs vmware-vsan-health vmware-vsm vsphere-client vsphere-ui
Stopped:
 vmcam vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-pod vmware-rbd-watchdog vmware-updatemgr vmware-vcha vsan-dps

--
Please don't forget to accept this as an accepted solution or give me a KUDO if you find this post useful! Thanks! 🙂

 

 

 

Senior Engineer HCI@DellEMC | vExpert ️| VCP-DCV | vSAN Specialist | VxRail and VMware Data Center Virtualisation Implementor | VxRail and VMware Data Center Virtualisation Administrator | Owner of virtualinca.com |
0 Kudos
flyingrobots_69
Contributor
Contributor
Jump to solution

I ended up modifying /usr/lib/vmware/site-packages/cis/certificateManagerOps.py so I could see the which service was dying.  I commented out the exception raising going on if the command failed.  I made changes so that the log file would contain the individual steps taken by service-controller.  It allowed me to see exactly which server was biting the dust.

I thought the log files were in  /var/log/vmware/vpxd-svcs, but I was able to see that it was vpxd instead. 

Then I found the following messages:

 

SSL Exception: Verification parameters:
--> PeerThumbprint: 46:57:EA:13:AD:6E:F3:CF:7F:1F:98:8A:C4:87:7A:2D:15:85:DD:2D
--> ExpectedThumbprint: 
--> ExpectedPeerName: vcenter.arilabs.net
--> The remote host certificate has these problems:
--> 
--> * path length constraint exceeded)

 

I realized that the intermediate CA certificate we were using had pathLength set to 0.  I increased that value, resigned the vcenter root certificate and now it works great.

So, it was indeed a certificate problem.

 

Thank you for your help @virtualinca 

 

Kevin

 
0 Kudos