VMware Cloud Community
kjsdtts
Contributor
Contributor

PSC 6.5 SSL custom cert replacement rolls back at last stage

Replacing external PSC 6.5 U1 with Microsoft CA certs - they comprise of an Intermediate and Root CA.

All goes well until the very last stage where Services can't start successfully.

Console error will say:

Status : 85% Completed [starting services...]

Error while starting services, please see log for more details

Status : 0% Completed [Operation failed, performing automatic rollback]

Error while replacing Machine SSL Cert, please see /var/log/vmware/vmcad/certificate-manager.log for more information.

Performing rollback of Machine SSL Cert...

Checking the certificate-manager.log, I find that there are services that fail to start due to a timeout:

2017-10-05T05:33:28.195Z INFO certificate-manager Running command :- service-control --start  --all

2017-10-05T05:33:28.196Z INFO certificate-manager please see service-control.log for service status

Service-control failed. Error Failed to start vmon services.vmon-cli RC=1, stderr=Failed to start sca, cm, vapi-endpoint services. Error: Operation timed out

2017-10-05T05:41:26.324Z ERROR certificate-manager None

2017-10-05T05:41:26.325Z ERROR certificate-manager Error while starting services, please see log for more details

2017-10-05T05:41:26.325Z ERROR certificate-manager Error while replacing Machine SSL Cert, please see /var/log/vmware/vmcad/certificate-manager.log for more information.

2017-10-05T05:41:26.325Z ERROR certificate-manager {

    "resolution": null,

    "detail": [

        {

            "args": [

                "None"

            ],

            "id": "install.ciscommon.command.errinvoke",

            "localized": "An error occurred while invoking external command : 'None'",

            "translatable": "An error occurred while invoking external command : '%(0)s'"

        },

        "Error while starting services, please see log for more details"

    ],

    "componentKey": null,

    "problemId": null

}

2017-10-05T05:41:26.326Z INFO certificate-manager Performing rollback of Machine SSL Cert...

There was a KB with something similar but this isn't an error while publishing cert using dir-cli.

Anyone seen this before?

Reply
0 Kudos
16 Replies
mhampto
VMware Employee
VMware Employee

Can you provide more detail on the environment.  Is this a multi node PSC? If so is there issue replacing the certificate there? Found a similar issue, though it related to VUM and that does not seem in play with what you are working on.

Reply
0 Kudos
kjsdtts
Contributor
Contributor

Appreciate your reply mhampto.

It is a multi PSC environment (all externals) and I can't even get past the first one.

I did see one thread somewhere expressing similar error messages except it was a fault with the issuer's signature algorithm. Mine's already sha256RSA.

There is another thread with someone deliberately halting the final script execution forcing against the rollback and upon reboots of both PSC and VCSA, the new cert did stick.

If this is the case then could it be a potential bug in the final install of the cert?

Wonder if anyone else has seen this.

Reply
0 Kudos
camko14
Contributor
Contributor

I am having the same issue, but only with 2 of 4 external PSCs.  I updated the cert on 2 in one site without issue.  Both PSCs in the other site have the issue.  I tried killing the rollback on 1 of the problem PSCs and after rebooting, I can see the cert is installed, but when trying to log into the PSC I get an error below.

pastedImage_0.png

Reply
0 Kudos
kjsdtts
Contributor
Contributor

Sorry to hear that didn't work out for you - exactly why I have decided not to risk it (yet)

Getting support soon once it has been organised. We'll see how that go.

In all their history, SSL changes and vCenter have generally been fraught with risks Smiley Sad

Reply
0 Kudos
camko14
Contributor
Contributor

No worries, systems are not in production yet and even though I cannot access the web configuration for the PSC, they still authenticate the connected services.  Totally agree with the historical statement. Smiley Wink.

I am going to throw the issue over to a VMware TAM this week, I will let you know if it gets resolved.

Reply
0 Kudos
AlexNG_
Enthusiast
Enthusiast

Hey camko, any update on this? We are having the same issue!

If you find this information useful, please award points for "correct" / "helpful".
Reply
0 Kudos
AlexNG_
Enthusiast
Enthusiast

Hey guys, just fixed my PSC! The problem was that our CAchain was not properly added to the trusted store! So runing the following command and importing again the MAchine SSL Cert, worked!

# /usr/lib/vmware-vmafd/bin/dir-cli trustedcert publish --chain --cert /root/ssl/chain.crt

Just after runing the command, we've imported again the machine SSL cert with:

# /usr/lib/vmware-vmca/bin/certificate-manager

And we've choosen option 1 - 2

If you find this information useful, please award points for "correct" / "helpful".
Reply
0 Kudos
msripada
Virtuoso
Virtuoso

From the logs, it is clear that it failed to start sca, cm services. Can you check the component manager logs and vapi endpoint logs to get more information on why it is failing to start which gives the idea about the cert issues.

Error Failed to start vmon services.vmon-cli RC=1, stderr=Failed to start sca, cm, vapi-endpoint services. Error: Operation timed out

Thanks,

MS

Reply
0 Kudos
dbnett
Contributor
Contributor

I know this is almost a year old now, but I've run into the exact same problem on an embedded deployment and haven't been able to find a solution after 3 days of research and trying different solutions and no possible solution on this thread either Smiley Sad

I get exactly the same error as kjsdtts.

Reply
0 Kudos
kjsdtts
Contributor
Contributor

I know this is nowhere near an answer but ours is that we have since moved on....

Reply
0 Kudos
PALMBEACHSTATE
Contributor
Contributor

Same issue here... integrated PSC.. attempting to replace integrated SSL certs with MS AD Certs.... and we also have a root CA+sub CA environment. Gets to 85% (during which I'm eventually able to log into WebGUI, browser showed correct cert chain validation).... but sits there for maybe 10 minutes, and then rolls it all back. If anyone has gotten this to work, please share!

VMware: Please help!

Reply
0 Kudos
dbnett
Contributor
Contributor

Well the good news is that I do have a solution for you Smiley Happy, validated and tested several times in the last week. When using an Embedded PSC you have to replace the "PSC" certificate with a certificate chain and not just a signed machine cert. Depending on the number of CA's in your chain, you have to include the signed machine cert and all the CA's in the certification chain. This is just for replacing the so called "machine" cert of the Embedded PSC/VCSA and does not affect the VMCA services.

If you need the info on creating the cert chain or have any other problems let me know. I've just complete this deployment and it works great now.

Reply
0 Kudos
rahul_dondeti
Contributor
Contributor

I have the similar issue, while importing the custom certificates to PSC , Please help to create a certificate chain to add it to Endpoint Certificate Store,  I have three certificates root , intermediate, server. which i got it created from 3rd party by issuing VMCA.

Reply
0 Kudos
dbnett
Contributor
Contributor

Rahul,

See step 7 of: Generate CSR with vSphere Certificate Manager and Prepare Root Certificate (Intermediate CA)

The just of it is that using a text editor you have to join all your certificates starting with your machine SSL cert working your way to top level cert. If you have more than 4 certs to join, just add it into the sequence and then save the new file as a .CER file. i.e. MachineSSL-Chain.cer

-----BEGIN CERTIFICATE-----

Machine Cert

-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----

CA intermediate certificates

-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----

Root certificate of enterprise or external CA

-----END CERTIFICATE-----

Reply
0 Kudos
joselin79
Contributor
Contributor

fullchain.cer generated via acme.sh script and it is still failing,

Other ideas?

Reply
0 Kudos
ParKon
Contributor
Contributor

Here's the reason for me while it was failing.  I used the IP for the PNID during the vCenter install vs using the FQDN.  I was told the customer didn't have option for DNS when I did the install so I used the IP.  When you generate certs and try to replace the self-signed Machine SSL Cert, the install will fail at 85% if you used an IP for the PNID during install.  You'll have to reinstall vCenter using the FQDN for the PNID or you won't be able to replace the self-signed certificates with custom ones.

Reply
0 Kudos