VMware Cloud Community
pierrevm123
Contributor
Contributor

Authorization errors while enabling Workload Management in VMware 7.0 and NSX-T 3.0

I am trying to enable Workload Management in VMware 7.0 and NSX-T 3.0. Unfortunately without success.
The constant factor is a lot of unauthorized errors with NSX in the wcpsvc.log
I am logged into vCenter with the local Administrator@vsphere.local account and vCenter has been added to NSX as a Compute Manager and Trust has been enabled. vCenter, NSX, Edge and ESXi hosts are all configured in the same VLAN.
Does anyone has an idea? See the logging below.

2020-11-15T12:55:04.203Z error wcp [opID=5fb24d4d] Error occurred sending Principal Identity request to NSX: principal identity already created
2020-11-15T12:55:04.203Z error wcp [opID=5fb24d4d] Failed to create PI in NSX managers. Err: principal identity already created
2020-11-15T12:55:04.203Z debug wcp [opID=5fb24d4d] WCP cluster principal identity (for cluster domain-c1006, service account wcp-cluster-user-domain-c1006-07eef2dd-e023-4707-a61e-f393cdd86090) already created
2

2020-11-15T12:55:10.658Z debug wcp [opID=5fb24d4d-domain-c1006] Cluster Network Provider is NSXT Container Plugin. Performing additional NCP-specific configuration.
2020-11-15T12:55:10.659Z error wcp [opID=5fb24d4d-domain-c1006] Error checking if NSX resources exist. Err: Unauthorized
2020-11-15T12:55:10.659Z error wcp [opID=5fb24d4d-domain-c1006] Error checking if NSX resources exist for VMs: [vm-4041]. Err: Unauthorized
2020-11-15T12:55:10.659Z error wcp [opID=5fb24d4d-domain-c1006] Error creating NSX resources. Err: Unauthorized
2020-11-15T12:55:10.659Z error wcp [opID=5fb24d4d-domain-c1006] Failed to create cluster network interface for MasterNode: VirtualMachine:vm-4041. Err: Unauthorized
2020-11-15T12:55:10.659Z error wcp [opID=5fb24d4d-domain-c1006] Error configuring cluster NIC on master VM vm-4041: Unauthorized
2020-11-15T12:55:10.659Z error wcp [opID=5fb24d4d-domain-c1006] Error configuring API server on cluster domain-c1006 Error configuring cluster NIC on master VM. This operation is part of API server configuration and will be retried.

 

0 Kudos
15 Replies
stephankuehne
Contributor
Contributor

Did you enable trust on the Compute Manager in NSX-T ?

In NSX-T --> System --> Fabric --> Compute Managers --> <Your vCenter config> --> Enable Trust

 

0 Kudos
pierrevm123
Contributor
Contributor

pierrevm123_0-1605548050034.png

 

Yes, Trust has been enabled.

It looks like there are authorization problems from vCenter to NSX but not the other way around.

0 Kudos
pierrevm123
Contributor
Contributor

pierrevm123_0-1605641021339.png

Configure operation for the Master node VM with identifier vm-7008 failed.

2020-11-15T19:09:15.671Z debug wcp [opID=EAMAgent] Ignore non WCP agency vCLS
2020-11-15T19:09:15.671Z debug wcp informer.processLoop() lister.List() returned
2020-11-15T19:09:18.02Z error wcp [opID=vapi] Security Context missing in the request
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] SecurityContext not passed in the request. Creating an empty security context
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] opId was not present for the request
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] Handling new request with input {"STRUCTURE":{"operation-input":{}}}
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] Service specific authorization scheme for com.vmware.vapi.std.introspection.service not found.
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] Service specific authorization scheme for com.vmware.vapi.std.introspection.service not found.
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] Could not find package specific auth scheme for com.vmware.vapi.std.introspection.service
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] Authn scheme Id is not provided but NO AUTH is allowed hence invoking the operation
2020-11-15T19:09:18.02Z error wcp [opID=vapi] SecurityCtx doesn't have property AUTHN_IDENTITY
2020-11-15T19:09:18.02Z error wcp [opID=vapi] Invalid authentication result
2020-11-15T19:09:18.021Z debug wcp [opID=vapi] Skipping authorization checks, because there is no authentication data for: com.vmware.vapi.std.introspection.service.list

 

0 Kudos
engyak
Enthusiast
Enthusiast

I'm having this issue as well - I do think that the NSX account check is a red herring, as the principal identity doesn't appear to be created on the manager side. I'm going to reset root on vCenter (unrelated), and rebuild trust with NSX-T, then retry:

 

2020-12-12T19:46:29.839Z debug wcp [opID=5fd5201f] NSX HTTP Request is: &{Method:POST URL:https://10.66.0.204:443/api/v1/trust-management/token-principal-identities/ Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[] Body:{Reader:{"description":"Principal Identity for WCP service","display_name":"wcp-1b34a1bc-6fec-4ca6-8728-984065fb69c7","id":"wcp-1b34a1bc-6fec-4ca6-8728-984065fb69c7","name":"wcp-1b34a1bc-6fec-4ca6-8728-984065fb69c7","node_id":"wcp-1b34a1bc-6fec-4ca6-8728-984065fb69c7"}} GetBody:0x843110 ContentLength:261 TransferEncoding:[] Close:false Host:10.66.0.204:443 Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr: RequestURI: TLS:<nil> Cancel:<nil> Response:<nil> ctx:0xc0000ce000}
2020-12-12T19:46:30Z debug wcp [opID=5fd5201f] NSX HTTP Response is: &{409  409 HTTP/1.1 1 1 map[Cache-Control:[no-cache, no-store, max-age=0, must-revalidate] Content-Type:[application/json] Date:[Sat, 12 Dec 2020 19:46:29 GMT] Expires:[0] Keep-Alive:[timeout=60] Pragma:[no-cache] Server:[NSX] Set-Cookie:[JSESSIONID=00B54AAEC7982E12A1DDEEF2B92170F5; Path=/; Secure; HttpOnly] Strict-Transport-Security:[max-age=31536000 ; includeSubDomains] Vary:[accept-encoding] X-Content-Type-Options:[nosniff] X-Frame-Options:[SAMEORIGIN] X-Nsx-Requestid:[9c2965e2-9ec8-4094-823b-bcdedc8579e3] X-Xss-Protection:[1; mode=block]] 0xc001f795c0 -1 [chunked] false true map[] 0xc001da6d00 0xc00145b970}, {
2020-12-12T19:46:30Z error wcp [opID=5fd5201f] Error occurred sending Principal Identity request to NSX: principal identity already created
2020-12-12T19:46:30Z error wcp [opID=5fd5201f] Failed to create PI in NSX managers. Err: principal identity already created

 

0 Kudos
engyak
Enthusiast
Enthusiast

Hey guys,

Are you all trying this with less than 3 compute nodes?

engyak_0-1607831001258.png

 

It seems that these nodes are reporting NCP00010 TN ID not found. At first, I thought this meant EDGE Transport node, but it could also mean HOST transport node. It'd take a bit for me to scare up some more hosts, and I'd like to validate my hypothesis a little more before going on Craigslist.

0 Kudos
stephankuehne
Contributor
Contributor

Are the following components all in the same Subnet?

  • vCenter
  • ESXi hosts (mgmt vmk)
  • NSX-T Manager
  • Kubernetes Control Plane

Common issues are also DNS (both, reachable and resolvable) and NTP. The errors are looking all very similar, if not identical.

Regards
Stephan

0 Kudos
engyak
Enthusiast
Enthusiast

I have noticed that as well when re-reviewing the logs.

Most people seem to be filtering based on severity because good docs are not yet available for this.

A LEVEL SET

Readers, if you're wondering what log:

 

root@vcenter [ /var/log/vmware/wcp ]# ls
stdstream.log-0.stderr  stdstream.log-1.stdout  stdstream.log-3.stderr  stdstream.log-4.stdout  tkg-telemetry                          wcpsvc-2020-12-12T19-45-55.657.log.gz  wcpsvc.log
stdstream.log-0.stdout  stdstream.log-2.stderr  stdstream.log-3.stdout  stdstream.log.stderr    wcpsvc-2020-12-12T15-38-27.008.log.gz  wcpsvc-2020-12-13T03-39-49.311.log.gz  wcp-telemetry
stdstream.log-1.stderr  stdstream.log-2.stdout  stdstream.log-4.stderr  stdstream.log.stdout    wcpsvc-2020-12-12T17-36-11.039.log.gz  wcpsvc-2020-12-13T19-50-23.995.log.gz

 

 The log wcpsvc.log is the one in question.

Furthermore, we're able to come to the following conclusions:

The following logs can be ignored in most cases, as they're just spam. 

 

wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:31:20.927Z error wcp [opID=5fc87d1a-domain-c1008] Error checking if NSX resources exist for VMs: [vm-4011]. Err: Unauthorized
wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:31:21.978Z error wcp [opID=5fc87d1a-domain-c1008] Error checking if NSX resources exist. Err: Unauthorized
wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:31:21.978Z error wcp [opID=5fc87d1a-domain-c1008] Error checking if NSX resources exist for VMs: [vm-4013]. Err: Unauthorized
wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:35:11.472Z error wcp [opID=5fc87e99] Error occurred sending Principal Identity request to NSX: principal identity already created
wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:35:11.472Z error wcp [opID=5fc87e99] Failed to create PI in NSX managers. Err: principal identity already created
wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:32:22.364Z error wcp [opID=5fc87e99] Error occurred sending Principal Identity request to NSX: principal identity already created
wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:32:22.364Z error wcp [opID=5fc87e99] Failed to create PI in NSX managers. Err: principal identity already created

 

 

A starter filter:

 

grep error wcpsvc* | egrep -v 'Error checking if NSX resources exist|principal identity already created'

 

 

There are definitely more messages here that are generating traffic, but these errors appear to simply say "there is an error" as opposed to "the error is X."

The second step I'd recommend is to go into NSX-T Manager and review what Tanzu built for you.

  • Verify a Tier-1 Gateway is created - if that works, you know you've made it past the first gate (NSX ETN should be L or XL)
  • Verify vn-segments have been created - there should be a whole bunch
  • Go to Inventory -> Containers-> Clusters tab and review what's there. There should be a single entry, with multiple columns to signify each element. For me, the "Nodes" column was what had down nodes - if more is up than down here your problem is above base infrastructure, and troubleshoot NSX/Tanzu. If it's below, test reachability.
0 Kudos
pierrevm123
Contributor
Contributor

Thanks for the reply. I think the problem is this: 

"Error configuring cluster NIC on master VM"

All the rest is looking fine in vCenter and NSX-T. I could even download the kubectl software from the kubernetes control plane member ip-addresses. Only the cluster ip-address does not work.

Because I couldn't find the solution I also tried to configure Tanzu with haproxy instead of NSX-T. To my surprise I got a similar error on the haproxy. There was a problem with enabling the VIP address on the haproxy. The haproxy configuration was looking fine. I saw that the VIP-address was not configured on the network interface of the haproxy server. Only the management ip-address was configured.

I am even wondering if it might be a hardware issue. I am running everything on one workstation with 128 GB memory and an I9 processor with nested ESXi 7.01.

0 Kudos
engyak
Enthusiast
Enthusiast

I think "Error configuring cluster NIC on master VM" is just a blanket statement that means it failed after NSX config stand-up was completed...

Did you have time to check? I am seeing the exact same issue on a single-host build where the cluster vIP isn't there, but one node IP is.

0 Kudos
stephankuehne
Contributor
Contributor

Are all three Supervisor VMs already up?
If so, how many NICs has each of them?

 

Regards,
Stephan

0 Kudos
engyak
Enthusiast
Enthusiast

Not the original poster, but we both are able to reach each supervisor VM via the overlay network.

If you're up to compare, it has 2 total, and is reachable on both.

0 Kudos
pierrevm123
Contributor
Contributor

Yes, all 3 supervisor VM's were up.

I have deleted my cluster so I don't know how many NICs but I believe the master node had 5 ip-addresses (ip4/ip6) and the other nodes less. The issue is with the cluster/namespace ip-address.

0 Kudos
mmakhija
Contributor
Contributor

Do we have solution for this . I am seeing same behavior in vSphere 7 Update 1

 

2021-01-07T18:04:54.088Z error wcp [opID=5ff66f83] Error occurred sending Principal Identity request to NSX: principal identity already created
2021-01-07T18:04:54.088Z error wcp [opID=5ff66f83] Failed to create PI in NSX managers. Err: principal identity already created
2021-01-07T18:04:54.088Z debug wcp [opID=5ff66f83] WCP service principal identity already created.
2021-01-07T18:04:54.113Z debug wcp [opID=5ff66f83] NSX HTTP Request is: &{Method:POST URL:https://10.202.xx.xx:443/api/v1/trust-management/token-principal-identities/ Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[] Body:{Reader:{"description":"Principal Identity for WCP cluster service account: wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7","display_name":"wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7","id":"wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7","name":"wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7","node_id":"wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7"}} GetBody:0x843110 ContentLength:434 TransferEncoding:[] Close:false Host:10.202.xx.xx:443 Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr: RequestURI: TLS:<nil> Cancel:<nil> Response:<nil> ctx:0xc000056058}
2021-01-07T18:04:54.24Z debug wcp [opID=5ff66f83] NSX HTTP Response is: &{409 409 HTTP/1.1 1 1 map[Cache-Control:[no-cache, no-store, max-age=0, must-revalidate] Content-Type:[application/json] Date:[Thu, 07 Jan 2021 18:04:54 GMT] Expires:[0] Keep-Alive:[timeout=60] Pragma:[no-cache] Server:[NSX] Set-Cookie:[JSESSIONID=BC0BF8015F1BA6308A935EF617AEDED6; Path=/; Secure; HttpOnly] Strict-Transport-Security:[max-age=31536000 ; includeSubDomains] Vary:[accept-encoding] X-Content-Type-Options:[nosniff] X-Frame-Options:[SAMEORIGIN] X-Nsx-Requestid:[8b1465a1-37e1-4fd8-993b-e12ab892fc42] X-Xss-Protection:[1; mode=block]] 0xc000a51380 -1 [chunked] false true map[] 0xc000275800 0xc0010ecb00}, {
"httpStatus" : "CONFLICT",
"error_code" : 2039,
"module_name" : "internal-framework",
"error_message" : "Principal TokenBasedPrincipalIdentityEntity{schemaValue=, identifier=null/null, touched=false, revision=0, displayName=wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7, description=Principal Identity for WCP cluster service account: wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7, createUser=null, lastModifiedUser=null, createTime=null, lastModifiedTime=null, systemResourceFlag=false, tags=[], name=wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7, nodeId=wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7, dataProtected=true} already exists."
}

Water6666
Contributor
Contributor

Do we hv solution for this. I am seeing same behavior in vSphere 7 Update 2 root@vc01 [ /var/log/vmware ]# tail -f /var/log/vmware/wcp/wcpsvc.log | grep error 2021-06-10T04:56:42.571Z error wcp [opID=domain-c8-host-14] Failed to get Kubernetes cluster node list: Unauthorized 2021-06-10T04:56:42.571Z error wcp [opID=domain-c8-host-14] Intent nodeReadyIntent, step scanNodeForReadyState for cluster domain-c8 node host-14 returned error Unauthorized 2021-06-10T04:56:42.571Z debug wcp [opID=domain-c8-host-14] For node host-14, setting configStatusMessages from ([]namespace_management.ClustersMessage)[{Severity:(namespace_management.ClustersMessageSeverityEnum)ERROR Details:(*std.LocalizableMessage){Id:(string)vcenter.wcp.systemerror DefaultMessage:(string)A general system error occurred. Args:([]string)[Unauthorized] Params:(map[string]std.LocalizationParam) Localized:(*string)}}] to ([]namespace_management.ClustersMessage)[{Severity:(namespace_management.ClustersMessageSeverityEnum)ERROR Details:(*std.LocalizableMessage){Id:(string)vcenter.wcp.systemerror DefaultMessage:(string)A general system error occurred. Args:([]string)[Unauthorized] Params:(map[string]std.LocalizationParam) Localized:(*string)}}] 2021-06-10T04:56:42.571Z error wcp [opID=domain-c8-host-14] Failed to realize node {nodeID:host-14 clusterID:domain-c8} state. Err Unauthorized. Will retry. 2021-06-10T04:57:20.585Z error wcp Failed to get Kubernetes healthz results on server, 10.10.70.40: the server has asked for the client to provide credentials 2021-06-10T04:58:23.294Z error wcp Failed to get Kubernetes healthz results on server, 10.10.70.40: the server has asked for the client to provide credentials 2021-06-10T04:59:27.858Z error wcp Failed to get Kubernetes healthz results on server, 10.10.70.40: the server has asked for the client to provide credentials "error_code" : 2039, "error_message" : "Principal TokenBasedPrincipalIdentityEntity{schemaValue=, identifier=null/null, touched=false, revision=0, displayName=wcp-ff4672a2-5269-49e8-af33-d6ae6042ea15, description=Principal Identity for WCP service, createUser=null, lastModifiedUser=null, createTime=null, lastModifiedTime=null, systemResourceFlag=false, tags=[], name=wcp-ff4672a2-5269-49e8-af33-d6ae6042ea15, nodeId=wcp-ff4672a2-5269-49e8-af33-d6ae6042ea15, dataProtected=true} already exists
0 Kudos
engyak
Enthusiast
Enthusiast

I'd recommend verifying that the logged principal identity is created in NSX, and that vSphere and NSX-T trust each other.

0 Kudos