Hi,
I have created a small nsx-t and kubernetes setup.
Running CentOS 7.6.1810
Docker 18.06.3-ce
REPOSITORY | TAG | IMAGE ID | CREATED | SIZE |
registry.local/2.4.1.13515827/nsx-ncp-rhel latest | d80a0f1e9112 | 3 months ago | 714MB | |
k8s.gcr.io/kube-proxy-amd64 | v1.11.4 | 5071d096cfcd | 9 months ago | 98.2MB |
k8s.gcr.io/kube-apiserver-amd64 | v1.11.4 | de6de495c1f4 | 9 months ago | 187MB |
k8s.gcr.io/kube-controller-manager-amd64 | v1.11.4 | dc1d57df5ac0 | 9 months ago | 155MB |
k8s.gcr.io/kube-scheduler-amd64 | v1.11.4 | 569cb58b9c03 | 9 months ago | 56.8MB |
k8s.gcr.io/coredns | 1.1.3 | b3b94275d97c | 14 months ago | 45.6MB |
k8s.gcr.io/etcd-amd64 | 3.2.18 | b8df3b177be2 | 16 months ago | 219MB |
k8s.gcr.io/pause-amd64 | 3.1 | da86e6ba6ca1 | 19 months ago | 742kB |
k8s.gcr.io/pause | 3.1 | da86e6ba6ca1 | 19 months ago | 742kB |
NAME | STATUS | ROLES | AGE | VERSION |
k8s-master01 Ready | master | 3h | v1.11.10 | |
k8s-node01 | Ready | <none> | 3h | v1.11.10 |
k8s-node02 | Ready | <none> | 3h | v1.11.10 |
kube-system nginx-deployment-67594d6bf6-nvnhd | 0/1 | ContainerCreating 0 | 44m | ||
kube-system nginx-deployment-67594d6bf6-vtmjp | 0/1 | ContainerCreating 0 | 44m | ||
nsx-system | nsx-ncp-vp4qq | 1/1 | Running | 1 | 1h |
nsx-system | nsx-node-agent-hwtd9 | 2/2 | Running | 15 | 44m |
nsx-system | nsx-node-agent-pr472 | 2/2 | Running | 15 | 44m |
I keep getting:
Warning FailedCreatePodSandBox 8m (x5 over 23m) | kubelet, k8s-node02 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "f4cfd0266b09b4b43f17696a2f2982533999acb0ad5c0f7406be5ec9fa612b2e" network for pod "nginx-deployment-67594d6bf6-nvnhd": NetworkPlugin cni failed to set up pod "nginx-deployment-67594d6bf6-nvnhd_kube-system" network: Failed to receive message header from nsx_node_agent, failed to clean up sandbox container "f4cfd0266b09b4b43f17696a2f2982533999acb0ad5c0f7406be5ec9fa612b2e" network for pod "nginx-deployment-67594d6bf6-nvnhd": NetworkPlugin cni failed to teardown pod "nginx-deployment-67594d6bf6-nvnhd_kube-system" network: Failed to connect to nsx_node_agent: [Errno 111] Connection refused] |
Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.634Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ Initialized CNI configuration
Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.634Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="DEBUG"] __main__ CNI Command in environment: DEL
Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.634Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ nsx_cni plugin invoked with arguments: DEL
Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.637Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ Reading configuration on standard input
Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.637Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ Unconfiguring networking for container 8d9e76c3ef3f9d928fff3a822b75ceb584a12a087daff51c0e61084144189f5c
Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.637Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="DEBUG"] __main__ Network config from input: {u'cniVersion': u'0.3.1', u'type': u'nsx', u'name': u'nsx-cni', u'mtu': 1500}
Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.637Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="ERROR" errorCode="NCP04002"] __main__ Failed to connect to nsx_node_agent: [Errno 2] No such file or directory
Aug 07 15:29:30 k8s-master01 kubelet[8084]: E0807 15:29:30.641209 8084 cni.go:280] Error deleting network: Failed to connect to nsx_node_agent: [Errno 2] No such file or directory
Aug 07 15:29:30 k8s-master01 kubelet[8084]: E0807 15:29:30.641878 8084 remote_runtime.go:115] StopPodSandbox "8d9e76c3ef3f9d928fff3a822b75ceb584a12a087daff51c0e61084144189f5c" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-78fcdf6894-n2pwr_kube-system" network: Failed to connect to nsx_node_agent: [Errno 2] No such file or directory
Aug 07 15:29:30 k8s-master01 kubelet[8084]: E0807 15:29:30.641907 8084 kuberuntime_gc.go:153] Failed to stop sandbox "8d9e76c3ef3f9d928fff3a822b75ceb584a12a087daff51c0e61084144189f5c" before removing: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-78fcdf6894-n2pwr_kube-system" network: Failed to connect to nsx_node_agent: [Errno 2] No such file or directory
Aug 07 15:29:30 k8s-master01 kubelet[8084]: W0807 15:29:30.643632 8084 cni.go:243] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "0421c69b6339249854d7ec1885c8094460e3cc7c4048ea70e473c4fc9796cbff"
Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.680Z k8s-master01 NSX 9686 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ Initialized CNI configuration
Any ideas ?
Found the problem (thx to vmware support!)
I made the <nodename> ncp/node_name and ncp_cluster on the VM name instead of the interface/logical switch.
Because of this the hyperbus was unhealthy.
On the esx host in 'nsxcli' you can type 'get hyperbus connection info'
this showed nothing.
That was esaclty the reason why i got a connection refused.
After changing the tagging the hyperbus was healthy and everything works.
xxxxx.infra.test> get hyperbus connection info
VIFID Connection Status
198c008e-dc61-406e-bf75-688c4dae0a24 169.254.1.12:2345 HEALTHY
4601b498-1ee7-4232-8ead-a70663a221e1 169.254.1.11:2345 HEALTHY
Also the nsx-agent-node gives a healthy now:
kubectl exec -n nsx-system -it nsx-node-agent-hclhs nsxcli
Defaulting container name to nsx-node-agent.
Use 'kubectl describe pod/nsx-node-agent-hclhs -n nsx-system' to see all of the containers in this pod.
NSX CLI (Node Agent). Press ? for command list or enter: help
k8s-node01> get node-agent-hyperbus status
HyperBus status: Healthy
k8s-node01>
---------------------------------------------------------------------------------------------------------
Was it helpful? Let us know by completing this short survey here.
What version of the NCP are you using? I noticed you're using a fairly old version of Kubernetes (1.11), so there may be a compatibility issue there. Check the release notes for your version of the NCP to ensure you're using one of the compatible versions.
Running 1.14.5 now.
NAME | STATUS ROLES | AGE | VERSION |
k8s-master01 Ready | master 8m15s v1.14.5 | ||
k8s-node01 | Ready | <none> 7m14s v1.14.5 | |
k8s-node02 | Ready | <none> 7m4s | v1.14.5 |
Still the same "network: Failed to connect to nsx_node_agent: [Errno 111] Connection refused]"
NCP 2.4.1.13515827
NSX 2.4.1
Earliest version that supports is 1.13.
Found the problem (thx to vmware support!)
I made the <nodename> ncp/node_name and ncp_cluster on the VM name instead of the interface/logical switch.
Because of this the hyperbus was unhealthy.
On the esx host in 'nsxcli' you can type 'get hyperbus connection info'
this showed nothing.
That was esaclty the reason why i got a connection refused.
After changing the tagging the hyperbus was healthy and everything works.
xxxxx.infra.test> get hyperbus connection info
VIFID Connection Status
198c008e-dc61-406e-bf75-688c4dae0a24 169.254.1.12:2345 HEALTHY
4601b498-1ee7-4232-8ead-a70663a221e1 169.254.1.11:2345 HEALTHY
Also the nsx-agent-node gives a healthy now:
kubectl exec -n nsx-system -it nsx-node-agent-hclhs nsxcli
Defaulting container name to nsx-node-agent.
Use 'kubectl describe pod/nsx-node-agent-hclhs -n nsx-system' to see all of the containers in this pod.
NSX CLI (Node Agent). Press ? for command list or enter: help
k8s-node01> get node-agent-hyperbus status
HyperBus status: Healthy
k8s-node01>
---------------------------------------------------------------------------------------------------------
Was it helpful? Let us know by completing this short survey here.