VMware Cloud Community
rszymczak
Hot Shot
Hot Shot
Jump to solution

vRealize Automation 7 - NSX deployments fail due to certificate issues with vRealize Orchestrator

Hi community,

after installing the latest version of vRA, vRO and NSX I'm running into issues when requesting components that use NSX components. First off: version details:

- vRA: 7.0.0 (build 3292778)

- vRO: 7.0.0.16989 (build 331003)

- NSX: 6.2.1 (build 3300239)

vRO plugin versions are the one bundled with the vRO version listed above except of the NSX plug-in which was updated to the latest release (1.0.3 released on 17.12.15).

Within the configured tenant vRO is configured as endpoint. I can verify data-collection is running and working. I can see the NSX plugin for vRO running the "create NSX endpoint" workflow from time to time using the configured VRO user from vRA.

Within the configured tenant vRO is as well configured as default vRO server for ASD. Connection test is sucessfull. When saving the config I'm asked to trust the vRO certificate, which I confirm. Note that the thumbprint shown does match the vRO certificate thumbprint that I get when visiting the vRO appliace on https://vro:8281. I'm able to browse the vRO workflows from within vRA's designer, thus: connection seems established.

Within vRO the vRA CAFE and IAAS plug-ins have been succesfully registred. I'm able to browse the plugin inventory for both plugins.

For troubleshooting the issue I create a new unified blueprint within the vRA design section with the following configuration:

- Transport zone: my configured NSX transport zone (verified: manual creation on this zone using NSX works just fine)

- Routed gateway res. pol: my res. pol. for the edge cluster to use

- The only component dragged to the canvas is a "Network & Security" --> "On-Demand NAT Network" which is using a pre-defined 1-to-many network profile as it's "Parent network profile" with no manual changes.

- Note that while this is a very basic example blueprint to illustrate the issue, it happens with any blueprint I configure if any component is confgured that requires the NSX plugin for vRO.

Every time I request that blueprint, the request fails with the error message: "Request [fa1e0689-0d06-4308-a914-e498c0d1fd99]: 404 Not Found"

Looking into vCenter, NSX and vRO I can verify that nothing is actually trigged when requesting the blueprint.

Looking into the vRA's /storage/log/vmware/vcac/catalina.log the issue becomes very visible:

com.vmware.vcac.iaas.vco.network.helper.VcoEndpointSelector.isEndpointAlive:88 -

vRealize Orchestrator endpoint with url [https://s00-vro.my.domain:8281/vco] is not alive.

Exception message:> [Host name 's00-vro.my.domain' does not match the certificate subject provided by the peer (CN=s00-vro.my.domain, OU=VMware, O=My Company, C=DE)]

com.vmware.vcac.iaas.vco.network.helper.VcoEndpointSelector.getFirstAliveEndpointByPriority:200

- vRealize Orchestrator endpoint [https://s00-vro.my.domain:8281/vco] with priority 1 is not alive. Skipping.

org.springframework.web.servlet.mvc.method.annotation.ExceptionHandlerExceptionResolv

er.logException:189 - Handler execution resulted in exception: Endpoint not found. There are no vRealize Orchestrator endpoints that are alive.

com.vmware.vcac.platform.service.rest.resolver.ApplicationExceptionHandler.handleHttpStatusCodeException:673 - 404 Not Found

org.springframework.web.client.HttpClientErrorException: 404 Not Found

...

...

...

Please note that I double checked the certificate. It's a self-signed certificate created using vRO 7.0's new control panel, the one I get when accessing https://vro:8281. It's valid and the subject (issed to CN) DOES perfectly match the host name entered within the ASD and endpoint configuration in vRA. It's resolveable and server time on all components is in sync with the used NTP.

By now I even re-generated the certificate and re-registred and restarted all components but while I can see that the certificate was updated in all components I'm still getting the same issue.

Never had this issue with previous version of NSX / vRA / vRO. I checked the documention if anything changed here but didn't find anything that I've been doing wrong. Anythimg I'm missing here? Bug anyone?

1 Solution

Accepted Solutions
rszymczak
Hot Shot
Hot Shot
Jump to solution

Ok that seems to be the issue. Atleast if upgraded from earlier versions ofvRO (can't verify if it's true for fresh vRO 7 installs as well but it's likely) the vRO "control center" will generate SHA1 based certificates which vRA doesn't like for actions that use the vRO ENDPOINT in vRA. ASD seems to work without such issues.

Sidenote: Upgraded vRO installs will also come with SHA1 based certs if they're using a self-signed cert created by vRO. However: one would think that re-creating the cert using the control center is enough. But it turns out it's not, since it will generate a (new) SHA1 based cert.

What I did to resolve the issue:

1. Create a SSH2 based vRO Certificate without any cert extensions, just like the one that ships with the vRA integrated vRO. I tend to use xCA for those jobs but openSSL will do aswell. The exact format required for the vRO certificate is not documented, but I can verify that you need it like this: PEM certificate in PKCS#1 format incuding private and public key, formatted like this:

-----BEGIN RSA PRIVATE KEY-----

(Your private Key: your_vro_server.key)

-----END RSA PRIVATE KEY-----

-----BEGIN CERTIFICATE-----

(Your primary certificate: your_vro_server.crt)

-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----

(Your intermediate certificate: intermed.crt)

-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----

(Your root certificate: root.crt)

-----END CERTIFICATE-----

I had issues when I used key extensions so I'd suggest you don't use them and only create a very basic cert without any V3 extensions, as shown on the right in the image from my last post (ideally you want to have a cert with the same properties as the cert used by the vRA integrated vRO appliance except of course the different CN and so forth).

2. Use the vRO control center located at https://your-externa-vro:8283/vco-controlcenter/#/ and switch to certificates --> Orchestrator Server SSL certificate. Use the import action to import your PEM cert. It should tell you that you need to reboot your vRO appliance. So REBOOT the appliance (do not just restart the service, this seems not to be sufficient).

3. In vRA remove the vRO Endpoint everywhere it was configured. Also I removed the vRO from the ASD config just to make sure nothings left.

4. Reboot vRA appliance (IAAS can be left untouched). I needed to do so because I could observe that the keystore at some point would keep beeing overwritten by vRA (???) certs I deleted from it (AND i verified they're deleted) re-appeared in the keystore after some time. After a reboot that issue was gone, keystore was clean.

5. Re-Add the  vRO Endpoint and ASD config. Accept the certificate.

6. Works.

So, while I have no more time to troubleshoot more I'd guess that the issue is the SHA1 based certificate that is generated by the vRO appliance. The internal appliance comes with a SHA2 based cert which works and after changeing the external appliance SHA1 cert into a SHA2 based cert it all works.

View solution in original post

9 Replies
GrantOrchardVMw
Commander
Commander
Jump to solution

Can you confirm that this is an external instance? I'll attempt to replicate the problem since I've only used the internal instance for NSX integration on 7.

Grant

Grant http://grantorchard.com
0 Kudos
rszymczak
Hot Shot
Hot Shot
Jump to solution

Hi Grant,

yes I can confirm it's an external vRO instance. Good hint with the internal one. Totally forgot it's still around - I'll try to reproduce the issue on the internal instance and report back.

0 Kudos
rszymczak
Hot Shot
Hot Shot
Jump to solution

I can confirm that it's working fine using the integrated vRO. Within vRA nothing was changed besides the vRO Endpoint and the default vRO for ASD.

As a side note: the (external) vRO 7 that is not working was upgraded from the latest vRO 6.x version avaiable. It was upgraded to 7 using the management website (5480) as stated in the upgrade documentation.

0 Kudos
GrantOrchardVMw
Commander
Commander
Jump to solution

Ok cool. That's a pointer... but to what I'm not sure Smiley Happy Good news is that it shows it can functionally work, but that something funky is going on with the connection to the external orchestrator instance.

With the new blueprint model, the orchestrator workflows for NSX are launched by the composition-service on the vRA appliance, and not through the DEMs. If I had to guess, you might need to manually import the certificate into the keystore on the appliance. I wouldn't have thought that this would be necessary, but <shrug>.

Grant

Grant http://grantorchard.com
0 Kudos
rszymczak
Hot Shot
Hot Shot
Jump to solution

So, that was my first tought as well. So I digged down the shell to find the correct keystore.

Seems like /var/lib/vco/keystore.password and /etc/vco/app-server/keystore.password both store the password to the keystore.

Looking as the firstboot and postupgrade scripts I found

postupdate.d/25-vco-keystore-password-reencrypt

postupdate.d/18-vco-keystore-password

firstboot.d/18-vco-keystore-password

Dunno when the frist one is used, but it seems to sha1sum the keyfile. The firstboot and postupdate scripts have the same content.

They both use the keystore located at /etc/vco/app-server/security/jssecacerts which I guess is the one that matters.

I guess postupgrade or at first boot they are run and change the default passtword ("dunesdunes") to a urandom value.

That password is saved to /etc/vco/app-server/keystore.password and used by the scripts.

So, that reencrypt script confuses me because it makes no sense to hash the password file that will be used for all further access to the keystore.

Whatever.

Outcome: I guess /etc/vco/app-server/security/jssecacerts is the keystore and /var/lib/vco/keystore.password the keystore password.

Using keytool -list -keystore /etc/vco/app-server/security/jssecacerts -storepass my_pass_from_the_password_file reveals that the vRO certificate that is used for https://myexternalvro:8281 is not inside the keystore.

So I added it using:

keytool -importcert -file /vco-cert.cer -keystore /etc/vco/app-server/security/jssecacerts -alias "myexternalvro" -storepass my_pass_from_the_password_file

I can verify that the certificate is now in the keystore. However, when I requested the blueprint, I was presented with the same error as before (404, resulting from a cert. issue).

So, what do I miss?

  • Any special alias that should be used for the vRO cert?
  • Wrong keystore perhaps? (if so: what's the storepass for the correct store and where is the keystore located at?)

Not sure if the keystore is loaded into memory at TCserver boottime. Thus, I'll try rebooting vRA now that the key is included. If that fixes the issue I'll let you know in the next minutes.

0 Kudos
telinwis
Contributor
Contributor
Jump to solution

we're having the same issue in our lab environment but also with the internal vRO of the vRA

it turned out that the vRA hostname wasn't a FQDN in CN part of the certificate but it's trying to connect to the FQDN so we're seeing the same errors in the catalina.out on the vRA.


Tried to modify it but SSO part of vRA is then complaining about wrong certificate so restoring our snapshot and restarting the install.

0 Kudos
rszymczak
Hot Shot
Hot Shot
Jump to solution

Yes I had the same issue with the vRO cert created in the upgrade process. The CN was not the FQDN of the vRO server. Thus, when I first visisted the logs and saw the issue I was thinking that the CN missmatch is causing the issue.

So, I used the re-create vRO certificate function of the control center to fix the issue. The new vRO cert has the correct CN and matches the FQDN as well as the config in vRA (Endpoint was provided by FQDN and ASD config was proviced as FQDN as well). Of course I then re-configured all vRA settings to make sure the new cert is downloaded (the "do you trust the cert" box appeared as it should and I approved it).

BUT after changing this I still had the exact same error message (cert. issue, as shown in the initial posting), just as you said.

I guess re-installing vRO seems to be the easy solution. I want to know what funky stuff is just happening here.

//FYI: rebooting didn't help also I tried adding the vRO cert using the alias that was used by the internal vRO cert "vco.sso.ssl.certificate". Same issue.

0 Kudos
rszymczak
Hot Shot
Hot Shot
Jump to solution

I just checked the cert differences. Seems like the cert of the not working, upgraded vRO instance is using a SHA1. Remembering that SHA1 is untrusted by default in the latest vSphere releases (VMware KB: After upgrading an ESXi host to 5.5 Update 3b and later, the host is no longer manageable...) I guess this could be a hint in the right direction. I'd like to point out that that cert was not "only" upgraded but also RE-created using the control panel in vRO. So the certs created by the control panel seem to be SHA1 based.

The integrated vRO is using SHA2.

0 Kudos
rszymczak
Hot Shot
Hot Shot
Jump to solution

Ok that seems to be the issue. Atleast if upgraded from earlier versions ofvRO (can't verify if it's true for fresh vRO 7 installs as well but it's likely) the vRO "control center" will generate SHA1 based certificates which vRA doesn't like for actions that use the vRO ENDPOINT in vRA. ASD seems to work without such issues.

Sidenote: Upgraded vRO installs will also come with SHA1 based certs if they're using a self-signed cert created by vRO. However: one would think that re-creating the cert using the control center is enough. But it turns out it's not, since it will generate a (new) SHA1 based cert.

What I did to resolve the issue:

1. Create a SSH2 based vRO Certificate without any cert extensions, just like the one that ships with the vRA integrated vRO. I tend to use xCA for those jobs but openSSL will do aswell. The exact format required for the vRO certificate is not documented, but I can verify that you need it like this: PEM certificate in PKCS#1 format incuding private and public key, formatted like this:

-----BEGIN RSA PRIVATE KEY-----

(Your private Key: your_vro_server.key)

-----END RSA PRIVATE KEY-----

-----BEGIN CERTIFICATE-----

(Your primary certificate: your_vro_server.crt)

-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----

(Your intermediate certificate: intermed.crt)

-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----

(Your root certificate: root.crt)

-----END CERTIFICATE-----

I had issues when I used key extensions so I'd suggest you don't use them and only create a very basic cert without any V3 extensions, as shown on the right in the image from my last post (ideally you want to have a cert with the same properties as the cert used by the vRA integrated vRO appliance except of course the different CN and so forth).

2. Use the vRO control center located at https://your-externa-vro:8283/vco-controlcenter/#/ and switch to certificates --> Orchestrator Server SSL certificate. Use the import action to import your PEM cert. It should tell you that you need to reboot your vRO appliance. So REBOOT the appliance (do not just restart the service, this seems not to be sufficient).

3. In vRA remove the vRO Endpoint everywhere it was configured. Also I removed the vRO from the ASD config just to make sure nothings left.

4. Reboot vRA appliance (IAAS can be left untouched). I needed to do so because I could observe that the keystore at some point would keep beeing overwritten by vRA (???) certs I deleted from it (AND i verified they're deleted) re-appeared in the keystore after some time. After a reboot that issue was gone, keystore was clean.

5. Re-Add the  vRO Endpoint and ASD config. Accept the certificate.

6. Works.

So, while I have no more time to troubleshoot more I'd guess that the issue is the SHA1 based certificate that is generated by the vRO appliance. The internal appliance comes with a SHA2 based cert which works and after changeing the external appliance SHA1 cert into a SHA2 based cert it all works.