VMware Networking Community
p0wertje
Hot Shot
Hot Shot
Jump to solution

nsx-t and kubernetes

Hi,

I have created a small nsx-t and kubernetes setup.

Running CentOS 7.6.1810

Docker 18.06.3-ce

REPOSITORY                               TAG             IMAGE ID        CREATED         SIZE
registry.local/2.4.1.13515827/nsx-ncp-rhel   latest          d80a0f1e9112    3 months ago    714MB
k8s.gcr.io/kube-proxy-amd64              v1.11.4         5071d096cfcd    9 months ago    98.2MB
k8s.gcr.io/kube-apiserver-amd64          v1.11.4         de6de495c1f4    9 months ago    187MB
k8s.gcr.io/kube-controller-manager-amd64 v1.11.4         dc1d57df5ac0    9 months ago    155MB
k8s.gcr.io/kube-scheduler-amd64          v1.11.4         569cb58b9c03    9 months ago    56.8MB
k8s.gcr.io/coredns                       1.1.3           b3b94275d97c    14 months ago   45.6MB
k8s.gcr.io/etcd-amd64                    3.2.18          b8df3b177be2    16 months ago   219MB
k8s.gcr.io/pause-amd64                   3.1             da86e6ba6ca1    19 months ago   742kB
k8s.gcr.io/pause                         3.1             da86e6ba6ca1    19 months ago   742kB

NAME       STATUSROLES AGE   VERSION
k8s-master01   Ready master3h    v1.11.10
k8s-node01 Ready <none>3h    v1.11.10
k8s-node02 Ready <none>3h    v1.11.10

kube-system   nginx-deployment-67594d6bf6-nvnhd  0/1   ContainerCreating   0      44m
kube-system   nginx-deployment-67594d6bf6-vtmjp  0/1   ContainerCreating   0      44m
nsx-systemnsx-ncp-vp4qq                      1/1   Running         1      1h
nsx-systemnsx-node-agent-hwtd9               2/2   Running         15     44m
nsx-systemnsx-node-agent-pr472               2/2   Running         15     44m

I keep getting:

Warning  FailedCreatePodSandBox  8m (x5 over 23m)kubelet, k8s-node02  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "f4cfd0266b09b4b43f17696a2f2982533999acb0ad5c0f7406be5ec9fa612b2e" network for pod "nginx-deployment-67594d6bf6-nvnhd": NetworkPlugin cni failed to set up pod "nginx-deployment-67594d6bf6-nvnhd_kube-system" network: Failed to receive message header from nsx_node_agent, failed to clean up sandbox container "f4cfd0266b09b4b43f17696a2f2982533999acb0ad5c0f7406be5ec9fa612b2e" network for pod "nginx-deployment-67594d6bf6-nvnhd": NetworkPlugin cni failed to teardown pod "nginx-deployment-67594d6bf6-nvnhd_kube-system" network: Failed to connect to nsx_node_agent: [Errno 111] Connection refused]

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.634Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ Initialized CNI configuration

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.634Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="DEBUG"] __main__ CNI Command in environment: DEL

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.634Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ nsx_cni plugin invoked with arguments: DEL

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.637Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ Reading configuration on standard input

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.637Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ Unconfiguring networking for container 8d9e76c3ef3f9d928fff3a822b75ceb584a12a087daff51c0e61084144189f5c

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.637Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="DEBUG"] __main__ Network config from input: {u'cniVersion': u'0.3.1', u'type': u'nsx', u'name': u'nsx-cni', u'mtu': 1500}

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.637Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="ERROR" errorCode="NCP04002"] __main__ Failed to connect to nsx_node_agent: [Errno 2] No such file or directory

Aug 07 15:29:30 k8s-master01 kubelet[8084]: E0807 15:29:30.641209    8084 cni.go:280] Error deleting network: Failed to connect to nsx_node_agent: [Errno 2] No such file or directory

Aug 07 15:29:30 k8s-master01 kubelet[8084]: E0807 15:29:30.641878    8084 remote_runtime.go:115] StopPodSandbox "8d9e76c3ef3f9d928fff3a822b75ceb584a12a087daff51c0e61084144189f5c" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-78fcdf6894-n2pwr_kube-system" network: Failed to connect to nsx_node_agent: [Errno 2] No such file or directory

Aug 07 15:29:30 k8s-master01 kubelet[8084]: E0807 15:29:30.641907    8084 kuberuntime_gc.go:153] Failed to stop sandbox "8d9e76c3ef3f9d928fff3a822b75ceb584a12a087daff51c0e61084144189f5c" before removing: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-78fcdf6894-n2pwr_kube-system" network: Failed to connect to nsx_node_agent: [Errno 2] No such file or directory

Aug 07 15:29:30 k8s-master01 kubelet[8084]: W0807 15:29:30.643632    8084 cni.go:243] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "0421c69b6339249854d7ec1885c8094460e3cc7c4048ea70e473c4fc9796cbff"

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.680Z k8s-master01 NSX 9686 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ Initialized CNI configuration

Any ideas ?

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved
0 Kudos
1 Solution

Accepted Solutions
p0wertje
Hot Shot
Hot Shot
Jump to solution

Found the problem (thx to vmware support!)

I made the <nodename> ncp/node_name and ncp_cluster on the VM name instead of the interface/logical switch.

Because of this the hyperbus was unhealthy.

On the esx host in 'nsxcli' you can type 'get hyperbus connection info'

this showed nothing.

That was esaclty the reason why i got a connection refused.

After changing the tagging the hyperbus was healthy and everything works.

xxxxx.infra.test> get hyperbus connection info

                VIFID                            Connection                         Status

198c008e-dc61-406e-bf75-688c4dae0a24         169.254.1.12:2345                     HEALTHY

4601b498-1ee7-4232-8ead-a70663a221e1         169.254.1.11:2345                     HEALTHY

Also the nsx-agent-node gives a healthy now:

kubectl exec -n nsx-system -it  nsx-node-agent-hclhs  nsxcli
Defaulting container name to nsx-node-agent.
Use 'kubectl describe pod/nsx-node-agent-hclhs -n nsx-system' to see all of the containers in this pod.
NSX CLI (Node Agent). Press ? for command list or enter: help
k8s-node01> get node-agent-hyperbus status
HyperBus status: Healthy

k8s-node01>

---------------------------------------------------------------------------------------------------------

Was it helpful? Let us know by completing this short survey here.

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved

View solution in original post

0 Kudos
5 Replies
daphnissov
Immortal
Immortal
Jump to solution

What version of the NCP are you using? I noticed you're using a fairly old version of Kubernetes (1.11), so there may be a compatibility issue there. Check the release notes for your version of the NCP to ensure you're using one of the compatible versions.

0 Kudos
p0wertje
Hot Shot
Hot Shot
Jump to solution

Running 1.14.5 now.

NAME       STATUS   ROLESAGE VERSION
k8s-master01   Readymaster   8m15s   v1.14.5
k8s-node01 Ready<none>   7m14s   v1.14.5
k8s-node02 Ready<none>   7m4sv1.14.5

Still the same "network: Failed to connect to nsx_node_agent: [Errno 111] Connection refused]"

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved
0 Kudos
p0wertje
Hot Shot
Hot Shot
Jump to solution

NCP 2.4.1.13515827

NSX 2.4.1

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved
0 Kudos
daphnissov
Immortal
Immortal
Jump to solution

Earliest version that supports is 1.13.

0 Kudos
p0wertje
Hot Shot
Hot Shot
Jump to solution

Found the problem (thx to vmware support!)

I made the <nodename> ncp/node_name and ncp_cluster on the VM name instead of the interface/logical switch.

Because of this the hyperbus was unhealthy.

On the esx host in 'nsxcli' you can type 'get hyperbus connection info'

this showed nothing.

That was esaclty the reason why i got a connection refused.

After changing the tagging the hyperbus was healthy and everything works.

xxxxx.infra.test> get hyperbus connection info

                VIFID                            Connection                         Status

198c008e-dc61-406e-bf75-688c4dae0a24         169.254.1.12:2345                     HEALTHY

4601b498-1ee7-4232-8ead-a70663a221e1         169.254.1.11:2345                     HEALTHY

Also the nsx-agent-node gives a healthy now:

kubectl exec -n nsx-system -it  nsx-node-agent-hclhs  nsxcli
Defaulting container name to nsx-node-agent.
Use 'kubectl describe pod/nsx-node-agent-hclhs -n nsx-system' to see all of the containers in this pod.
NSX CLI (Node Agent). Press ? for command list or enter: help
k8s-node01> get node-agent-hyperbus status
HyperBus status: Healthy

k8s-node01>

---------------------------------------------------------------------------------------------------------

Was it helpful? Let us know by completing this short survey here.

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved
0 Kudos