VMware Networking Community
daphnissov
Immortal
Immortal
Jump to solution

NSX-T 2.4: Node cert replacement via API fails with "com.vmware.nsx.management.container.exceptions.InvalidOwnerException" in nsxapi.log

I've gone through this about half-a-dozen times with different types of certificates each with the same message of failure in /var/log/proton/nsxapi.log. This same process worked fine with NSX-T 2.3. Let me explain my topology (screenshots for sauce).

  • NSX-T 2.4
  • 3 nodes
    • cz-nest-nsxtm01 (10.2.21.16)
    • cz-nest-nsxtm02 (10.2.21.17)
    • cz-nest-nsxtm03 (10.2.21.18)
  • 1 HA VIP address (ILB)
    • cz-nest-nsxtm (10.2.21.19)

pastedImage_6.png

I have referenced the steps in VVD 5.0.1 and used the CertGen utility to create certificates signed by my internal enterprise CA. When replacement on cz-nest-nsxtm01 with the node cert did not work, I attempted steps with a self-signed cert only with the same failure. I upload the CA cert and upload the node certs, including the one for the cluster IP. SAN contains FQDN and IP. I don't think this is an issue of cert contents.

I retrieve the ID of the cert in question. For more details of the cert, if I curl to /api/v1/trust-management/certificates it returns the following:

{

"pem_encoded": <REDACTED>,

"used_by": [],

"resource_type": "certificate_signed",

"id": "67eb0a1c-e06c-476e-980a-08519b90d16f",

"display_name": "cz-nest-nsxtm01",

"tags": [

  {

"scope": "policyPath",

"tag": "/infra/certificates/cz-nest-nsxtm01"

  }

  ],

"_create_user": "nsx_policy",

"_create_time": 1556977318572,

"_last_modified_user": "nsx_policy",

"_last_modified_time": 1556977318572,

"_system_owned": false,

"_protection": "REQUIRE_OVERRIDE",

"_revision": 0

}

I post to the necessary URI as follows:

curl -k -u admin:VMware1!' -X POST "https://cz-nest-nsxtm01.domain.com/api/v1/node/services/http?action=apply_certificate&certificate_id..."

In response I receive:

{

  "error_code": 36235,

  "error_message": "Error updating certificate usage.",

  "module_name": "node-services"

}

Upon examination of /var/log/proton/nsxapi.log I find the following messages logged after the operation returns failure (markup by VS Code for convenience):

2019-05-04T13:19:18.849Z INFO http-nio-127.0.0.1-7440-exec-1 PreAuthenticatedAuthenticationProvider - - [nsx@6876 comp="nsx-manager" subcomp="manager"] User node-mgmt. Granted authorities: ''

2019-05-04T13:19:18.849Z INFO http-nio-127.0.0.1-7440-exec-1 PreAuthenticatedAuthenticationProvider - - [nsx@6876 comp="nsx-manager" subcomp="manager"] User node-mgmt. Granted authorities: ''

2019-05-04T13:19:18.876Z INFO http-nio-127.0.0.1-7440-exec-1 AuditingServiceImpl - SYSTEM [nsx@6876 audit="true" comp="nsx-manager" reqId="bbf540a7-e46c-4590-811f-b078753c526e" subcomp="manager"] UserName="node-mgmt", ModuleName="CertificateManager", Operation="GetPrivateCertificate", Operation status="success", New value=["5c4f0ee9-00cb-4acd-8431-07903767204a"]

2019-05-04T13:19:18.924Z INFO http-nio-127.0.0.1-7440-exec-2 PreAuthenticatedAuthenticationProvider - - [nsx@6876 comp="nsx-manager" subcomp="manager"] User node-mgmt. Granted authorities: ''

2019-05-04T13:19:18.925Z INFO http-nio-127.0.0.1-7440-exec-2 PreAuthenticatedAuthenticationProvider - - [nsx@6876 comp="nsx-manager" subcomp="manager"] User node-mgmt. Granted authorities: ''

2019-05-04T13:19:18.936Z INFO http-nio-127.0.0.1-7440-exec-2 TrustStoreFacadeImpl - SYSTEM [nsx@6876 comp="nsx-manager" subcomp="manager"] Reserve certificate 5c4f0ee9-00cb-4acd-8431-07903767204a

2019-05-04T13:19:18.944Z INFO http-nio-127.0.0.1-7440-exec-2 TrustStoreServiceImpl - SYSTEM [nsx@6876 comp="nsx-manager" subcomp="manager"] Reserve service type API for node 4c9f2c42-57fd-88d4-24bb-3917f5e69a12 for certificate node-cz-nest-nsxtm01

2019-05-04T13:19:18.950Z ERROR http-nio-127.0.0.1-7440-exec-2 PrincipalOwnerValidator - - [nsx@6876 comp="nsx-manager" errorCode="MP289" subcomp="manager"] XXX Principal 'node-mgmt' with role '[]' attempts to delete or modify an object of type ImmutableCertificateEntity it doesn't own. (createUser=nsx_policy, allowOverwrite=null)

2019-05-04T13:19:18.951Z INFO http-nio-127.0.0.1-7440-exec-2 AuditingServiceImpl - SYSTEM [nsx@6876 audit="true" comp="nsx-manager" reqId="5ce8722f-1c1e-4681-a181-db21e86aa72e" subcomp="manager"] UserName="node-mgmt", ModuleName="CertificateManager", Operation="CertificateReserve", Operation status="failure", New value=["5c4f0ee9-00cb-4acd-8431-07903767204a" {"service_type":"API","node_id":"4c9f2c42-57fd-88d4-24bb-3917f5e69a12"}]

2019-05-04T13:19:18.952Z INFO http-nio-127.0.0.1-7440-exec-2 NsxBaseRestController - - [nsx@6876 comp="nsx-manager" subcomp="manager"] Error in API /nsxapi/api/v1/trust-management/certificates/5c4f0ee9-00cb-4acd-8431-07903767204a?action=reserve caused by exception com.vmware.nsx.management.container.exceptions.InvalidOwnerException: {"moduleName":"common-services","errorCode":289,"errorMessage":"Principal 'node-mgmt' with role '[]' attempts to delete or modify an object of type ImmutableCertificateEntity it doesn't own. (createUser=nsx_policy, allowOverwrite=null)"}

As can be seen, it appears to be complaining about rights assigned to the user (admin) executing the POST, which doesn't make sense because it's the admin account. Otherwise, my other thought was it's refusing the operation because the 3 appliances have already been clustered. In the VVD procedure for this it makes no special mention of node leadership. But it does have the user replace the cert on the nodes individually prior to a cluster IP being assigned.

I've also checked the official NSX-T 2.4 documentation (doc rev. 12 April 2019; PDF p536) and there is again no special mention of anything that was different in this process from 2.3.

Anyone seen (or tried) this? If I don't hear anything I'll try to break the cluster, delete the other nodes, redeploy, and try again.

EDIT 1:  Even if I break the cluster IP (reset action) but leave all three nodes up and try the replacement, I get the same error in the logs as before.

EDIT 2:  I destroyed all the manager nodes except the first, rebooted, and tried the replacement. It failed yet again with the same messages. So I'm pretty much out of ideas here.

1 Solution

Accepted Solutions
daphnissov
Immortal
Immortal
Jump to solution

FINAL UPDATE:

It turns out if you're not an idiot (like I've been) and fully read the UI, you'd notice a little switch at the bottom that configures the cert for one of two possibilities. It defaults to "Yes" for services. It needs to be "No" for API/Manager.

pastedImage_1.png

Lesson learned. Don't be like me. Read things completely and carefully.

(facepalm)

View solution in original post

6 Replies
daphnissov
Immortal
Immortal
Jump to solution

Alright, after more head-banging, here's the solution. I don't know how the VVD docs or official NSX-T docs can possibly be correct with the procedure stated.

TL;DR Version:  The certificate you wish to use to assign to the Manager node must be POSTed via the API and not imported using the UI.

I combed through the API docs for the 50th time and scrutinized the language /api/v1/node/services/http?action=apply_certificate. It states this:

Update http service certificate

Applies a security certificate to the http service. In the POST request,
the CERTIFICATE_ID references a certificate created with the
/api/v1/trust-management APIs. Issuing this request causes the http service
to restart so that the service can begin using the new certificate. When the
POST request succeeds, it doesn't return a valid response. The request times
out because of the restart.

(emphasis mine here)

When you upload a certificate in the UI as outlined in the official NSX-T docs and VVD docs and then perform a GET on /api/v1/trust-management/certificates, the ownership comes back as follows (pre-assignment):

{

    "pem_encoded" : <REDACTED>,

    "used_by" : [ ],

    "resource_type" : "certificate_signed",

    "id" : "d5079562-93d7-405b-956a-2814fdce862b",

    "display_name" : "cz-nest-nsxtm",

    "tags" : [ {

      "scope" : "policyPath",

      "tag" : "/infra/certificates/cz-nest-nsxtm"

    } ],

    "_create_user" : "nsx_policy",

    "_create_time" : 1556720383527,

    "_last_modified_user" : "nsx_policy",

    "_last_modified_time" : 1556720383527,

    "_system_owned" : false,

    "_protection" : "REQUIRE_OVERRIDE",

    "_revision" : 0

  }

Compare this to a cert created by POSTing to /api/v1/trust-management/certificates?action=import

{

    "pem_encoded" : <REDACTED>,

    "used_by" : [ {

      "node_id" : "4c9f2c42-57fd-88d4-24bb-3917f5e69a12",

      "service_types" : [ "API" ]

    } ],

    "resource_type" : "certificate_signed",

    "id" : "b0ab7d4f-0ef9-46f7-b159-aa0321176b98",

    "display_name" : "b0ab7d4f-0ef9-46f7-b159-aa0321176b98",

    "tags" : [ ],

    "_create_user" : "admin",

    "_create_time" : 1556984471564,

    "_last_modified_user" : "node-mgmt",

    "_last_modified_time" : 1556984595855,

    "_system_owned" : false,

    "_protection" : "NOT_PROTECTED",

    "_revision" : 1

  }

As you can see, the _create_user is "admin" and not "nsx_policy". This appears to be the source of the failures.

Also, it took lots of trial and error to find the correct format to send the PEM-encoded cert and private key. The sample in the API documentation is incorrect. It lists the following example:

Example Request:

POST https://<nsx-mgr>/api/v1/trust-management/certificates?action=import { "pem_encoded": "-----BEGIN CERTIFICATE----------END CERTIFICATE-----\n-----BEGIN CERTIFICATE----------END CERTIFICATE-----\n", "private_key": "-----BEGIN RSA PRIVATE KEY----------END RSA PRIVATE KEY-----\n", "passphrase": "1234" }

The newline (\n) characters are incorrectly placed. They need to occur after the initial headers and before the last headers for cert and key so it looks like

"pem_encoded" : "-----BEGIN CERTIFICATE-----\n<ACTUAL_BASE64_KEY_DATA_HERE>==\n-----END CERTIFICATE-----\n"

0 Kudos
daphnissov
Immortal
Immortal
Jump to solution

FINAL UPDATE:

It turns out if you're not an idiot (like I've been) and fully read the UI, you'd notice a little switch at the bottom that configures the cert for one of two possibilities. It defaults to "Yes" for services. It needs to be "No" for API/Manager.

pastedImage_1.png

Lesson learned. Don't be like me. Read things completely and carefully.

(facepalm)

MakeItWork
Enthusiast
Enthusiast
Jump to solution

Were you able to assign a new certificate to the cluster?

I keep getting this error nsxapi.log.

INFO http-nio-127.0.0.1-7440-exec-6 NsxBaseRestController - - [nsx@6876 comp="nsx-manager" subcomp="manager"] Error in API /nsxapi/api/v1/trust-management/certificates/cbfb4699-96bc-464a-b2cb-cec8c2740eb0?action=set_cluster_api_certificate caused by exception com.vmware.nsx.management.truststore.exceptions.ClusterCertificateReserveException:  {"moduleName":"internal-framework","errorCode":2036,"errorMessage":"Cluster Certificate not reserved."}

I've tried adding service, non-service and a cert generated from the manager CSR creator.

This is a brand new install, using version 2.4.0.0.0.12456646.

Visit us at http://www.cloudnutz.com and twitter @cloudnutz
0 Kudos
daphnissov
Immortal
Immortal
Jump to solution

It's funny you should mention this, I too am not able to assign a certificate to the cluster and I fail with the exact same message. I actually have an SR open on it right now (no response from GSS). I've even tried clearing the existing cert ID from the cluster and assigning the new cert without luck.

0 Kudos
daphnissov
Immortal
Immortal
Jump to solution

Heard back from GSS and this issue is a known bug which is being fixed in 2.4.1.

0 Kudos
daphnissov
Immortal
Immortal
Jump to solution

This issue is confirmed to be fixed in 2.4.1. I was able to successfully apply the new certificate to the cluster after the upgrade.

0 Kudos