I am attempting to setup Workload Management in a greenfield vSphere 7 environment with NSX-T and it continues to hang at "Error configurating cluster NIC on master VM. This operation is part of API server configuration and will be retried". I see the following in the wcpsvc.log file:
2020-09-08T16:16:54.416Z error wcp [opID=5f57bd08-domain-c8] Failed to create cluster network interface for MasterNode: VirtualMachine:vm-88. Err: Unauthorized
2020-09-08T16:16:54.416Z error wcp [opID=5f57bd08-domain-c8] Error configuring cluster NIC on master VM vm-88: Unauthorized
2020-09-08T16:16:54.416Z error wcp [opID=5f57bd08-domain-c8] Error configuring API server on cluster domain-c8 Error configuring cluster NIC on master VM. This operation is part of API server configuration and will be retried.
My vCenter, and NSX deployments are on the same Layer 2 segment. NSX-T is currently functioning, with a connectivity validated from a logical segment out to the Internet. I have also validated that MTU is 1600 throughout the environment.
Are your hosts also running ESXi 7?
Yes, ESXi 7 build 16324942.
Hi elihuj,
Make sure the edge nodes are deployed as a medium (suggest large if you have the available resources) as the LB deployed is a medium size.
Hello VirtualizingStuff, thank you for the reply. I did deploy a Large Edge, but unfortunately that was not the fix. I tried it again, and for whatever reason it succeeded all the way through.
Hello,
I've the same issue. NSX-T 3.1, VMware ESXi, 7.0.1, 17168206, vCenter build: 17004997
In NSX-T manager Alarm there is one Open issue when Workload Management hang. I'm using 3 NSX manager appliance.
Manager Node has detected the NCP is down or unhealthy.
Entity name: domain-c11:a83fdad6-c5e1-472e-a47b-d670fb2dd1c3
I noticed this entity is not exists. I'm very new in NSX-T so I do not know this error is relevant or not.
Transport nodes and Edge nodes Tunnels are fine if I'm right.
Please give advice where should I search the root cause. Thank you.
This error seems common as I see lots of people having the same issue. I wonder if anyone at VMware knows how to troubleshoot it?
Two most common reasons are:
1. Trust is not enabled in the Compute Manager for this vCenter in NSX.
2. Time between vCenter and NSX is not in sync.
can you please get NCP log :
kubectl -n vmware-system-nsx logs <ncp-pod-name> -p
when you enabled WCP you enter "corp.local" as master DNS?
Usually this kind of error occurs when master and worker DNS configured as same.
Actually the master DNS should be reachable from the management network and worker DNS should be reachable from workload network.
If both the DNS servers are same then it need to be reachable from both networks(Management/Workload).
To cross check the network reachability ,
- Connect to the Kubernetes API master VM
- Run below commands,
1) ping -I eth0 <masterDNS>
2) ping -I eth1 <workerDNS>
Ok, this may be an issue. I am not well versed on the networking going on here. I am not sure how to assign IP addresses to the Ingress and Egress CIDRs. I assume by "worker" you mean these. I understand these need to be routable, But I can't figure out what VLAN they are on. I also don't have the capability to do BGP, and am not sure how to enter a route to these addresses. I can't even figure out what the interface to the T0 and T1 routers are. I understand networking, just not NSX-T.
Hi,
Do you know any other way to login the supervisor VM?
I had the same issue "Error configurating cluster NIC on master VM" therefore the "workload management" -> "namespaces" web page hanging at "workload management is still being configured. Please check back later".
I believe this "hanging" is preventing me from download and install k8s cli tool to connect to the control plane VMs.
By the way,
Do the DNS records need to be created for the master & worker before the deployment of workload management cluster?
thanks
Login into the Supervisor Master VM:
- SSH into the vCenter and enable shell(if required)
- Run "/usr/lib/vmware-wcp/decryptK8Pwd.py" to get the IP address and password for SC Master VM.
Eg:
# /usr/lib/vmware-wcp/decryptK8Pwd.py
Read key from file
Connected to PSQL
Cluster: domain-c8:2bcXXXX
IP: 10.xx.xx.xx
PWD: xxxxxxxxxxx
# ssh root@10.xx.xx.xx
type "yes" and provide above PWD.
After connect to supervisor master VM session , run the previous "ping" commands to check the Master/Worker DNS connectivity , nodes status like "kubectl get nodes" and system pods status "kubectl get pods -A" for troubleshooting.
>> Do the DNS records need to be created for the master & worker before the deployment of workload management cluster?
Its completely depends on your network, but for master directly use the management DNS.
For those who are interested, I had to get BGP working on the ToR switch to get Workload Management to install. Maybe you can get by without it, but it didn't work for me. Just Sayin'
@doskiranThat's useful!
I discovered that a pod " tmc-agent-installer-1611810900-8n776" is in error status and another pod "vsphere-csi-controller-6687dc774f-xnbfq" is in crashloopbackoff status in the master.
I didn't have DNS records created for master/worker yet so the ping was unsuccessful.
The three masters are all in "ready" status(using "kubectl get nodes") so i can only assume that the hanging issue that I mentioned before was due to other unknown reason...
Thanks!
Hi,
For those who are using BGP to get work the tanzu deployment here is the right tutorial
In my case this NSX-T not being able to connect to the compute manager was the problem
This was the fix in my case (not an IP connectivity problem) https://www.ibm.com/support/pages/after-vcenter-upgrade-connection-status-compute-manager-shows-down...