Solved: Re: nsx-t and kubernetes

p0wertje · ‎08-07-2019

Hi,

I have created a small nsx-t and kubernetes setup.

Running CentOS 7.6.1810

Docker 18.06.3-ce

REPOSITORY	TAG	IMAGE ID	CREATED	SIZE
registry.local/2.4.1.13515827/nsx-ncp-rhel latest	d80a0f1e9112	3 months ago	714MB
k8s.gcr.io/kube-proxy-amd64	v1.11.4	5071d096cfcd	9 months ago	98.2MB
k8s.gcr.io/kube-apiserver-amd64	v1.11.4	de6de495c1f4	9 months ago	187MB
k8s.gcr.io/kube-controller-manager-amd64	v1.11.4	dc1d57df5ac0	9 months ago	155MB
k8s.gcr.io/kube-scheduler-amd64	v1.11.4	569cb58b9c03	9 months ago	56.8MB
k8s.gcr.io/coredns	1.1.3	b3b94275d97c	14 months ago	45.6MB
k8s.gcr.io/etcd-amd64	3.2.18	b8df3b177be2	16 months ago	219MB
k8s.gcr.io/pause-amd64	3.1	da86e6ba6ca1	19 months ago	742kB
k8s.gcr.io/pause	3.1	da86e6ba6ca1	19 months ago	742kB

NAME	STATUS	ROLES	AGE	VERSION
k8s-master01 Ready	master	3h	v1.11.10
k8s-node01	Ready	<none>	3h	v1.11.10
k8s-node02	Ready	<none>	3h	v1.11.10

kube-system nginx-deployment-67594d6bf6-nvnhd	0/1	ContainerCreating 0	44m
kube-system nginx-deployment-67594d6bf6-vtmjp	0/1	ContainerCreating 0	44m
nsx-system	nsx-ncp-vp4qq	1/1	Running	1	1h
nsx-system	nsx-node-agent-hwtd9	2/2	Running	15	44m
nsx-system	nsx-node-agent-pr472	2/2	Running	15	44m

I keep getting:

Warning FailedCreatePodSandBox 8m (x5 over 23m)

kubelet, k8s-node02 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "f4cfd0266b09b4b43f17696a2f2982533999acb0ad5c0f7406be5ec9fa612b2e" network for pod "nginx-deployment-67594d6bf6-nvnhd": NetworkPlugin cni failed to set up pod "nginx-deployment-67594d6bf6-nvnhd_kube-system" network: Failed to receive message header from nsx_node_agent, failed to clean up sandbox container "f4cfd0266b09b4b43f17696a2f2982533999acb0ad5c0f7406be5ec9fa612b2e" network for pod "nginx-deployment-67594d6bf6-nvnhd": NetworkPlugin cni failed to teardown pod "nginx-deployment-67594d6bf6-nvnhd_kube-system" network: Failed to connect to nsx_node_agent: [Errno 111] Connection refused]

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.634Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ Initialized CNI configuration

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.634Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="DEBUG"] __main__ CNI Command in environment: DEL

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.634Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ nsx_cni plugin invoked with arguments: DEL

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.637Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ Reading configuration on standard input

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.637Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ Unconfiguring networking for container 8d9e76c3ef3f9d928fff3a822b75ceb584a12a087daff51c0e61084144189f5c

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.637Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="DEBUG"] __main__ Network config from input: {u'cniVersion': u'0.3.1', u'type': u'nsx', u'name': u'nsx-cni', u'mtu': 1500}

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.637Z k8s-master01 NSX 9685 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="ERROR" errorCode="NCP04002"] __main__ Failed to connect to nsx_node_agent: [Errno 2] No such file or directory

Aug 07 15:29:30 k8s-master01 kubelet[8084]: E0807 15:29:30.641209 8084 cni.go:280] Error deleting network: Failed to connect to nsx_node_agent: [Errno 2] No such file or directory

Aug 07 15:29:30 k8s-master01 kubelet[8084]: E0807 15:29:30.641878 8084 remote_runtime.go:115] StopPodSandbox "8d9e76c3ef3f9d928fff3a822b75ceb584a12a087daff51c0e61084144189f5c" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-78fcdf6894-n2pwr_kube-system" network: Failed to connect to nsx_node_agent: [Errno 2] No such file or directory

Aug 07 15:29:30 k8s-master01 kubelet[8084]: E0807 15:29:30.641907 8084 kuberuntime_gc.go:153] Failed to stop sandbox "8d9e76c3ef3f9d928fff3a822b75ceb584a12a087daff51c0e61084144189f5c" before removing: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-78fcdf6894-n2pwr_kube-system" network: Failed to connect to nsx_node_agent: [Errno 2] No such file or directory

Aug 07 15:29:30 k8s-master01 kubelet[8084]: W0807 15:29:30.643632 8084 cni.go:243] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "0421c69b6339249854d7ec1885c8094460e3cc7c4048ea70e473c4fc9796cbff"

Aug 07 15:29:30 k8s-master01 kubelet[8084]: 1 2019-08-07T15:29:30.680Z k8s-master01 NSX 9686 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_cni" level="INFO"] __main__ Initialized CNI configuration

Any ideas ?

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved

p0wertje · ‎08-09-2019

Found the problem (thx to vmware support!)

I made the <nodename> ncp/node_name and ncp_cluster on the VM name instead of the interface/logical switch.

Because of this the hyperbus was unhealthy.

On the esx host in 'nsxcli' you can type 'get hyperbus connection info'

this showed nothing.

That was esaclty the reason why i got a connection refused.

After changing the tagging the hyperbus was healthy and everything works.

xxxxx.infra.test> get hyperbus connection info

VIFID Connection Status

198c008e-dc61-406e-bf75-688c4dae0a24 169.254.1.12:2345 HEALTHY

4601b498-1ee7-4232-8ead-a70663a221e1 169.254.1.11:2345 HEALTHY

Also the nsx-agent-node gives a healthy now:

kubectl exec -n nsx-system -it nsx-node-agent-hclhs nsxcli
Defaulting container name to nsx-node-agent.
Use 'kubectl describe pod/nsx-node-agent-hclhs -n nsx-system' to see all of the containers in this pod.
NSX CLI (Node Agent). Press ? for command list or enter: help
k8s-node01> get node-agent-hyperbus status
HyperBus status: Healthy

k8s-node01>

---------------------------------------------------------------------------------------------------------

Was it helpful? Let us know by completing this short survey here.

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved

View solution in original post

daphnissov · ‎08-07-2019

What version of the NCP are you using? I noticed you're using a fairly old version of Kubernetes (1.11), so there may be a compatibility issue there. Check the release notes for your version of the NCP to ensure you're using one of the compatible versions.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

p0wertje · ‎08-07-2019

Running 1.14.5 now.

NAME	STATUS ROLES	AGE	VERSION
k8s-master01 Ready	master 8m15s v1.14.5
k8s-node01	Ready	<none> 7m14s v1.14.5
k8s-node02	Ready	<none> 7m4s	v1.14.5

Still the same "network: Failed to connect to nsx_node_agent: [Errno 111] Connection refused]"

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved

p0wertje · ‎08-07-2019

NCP 2.4.1.13515827

NSX 2.4.1

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved

daphnissov · ‎08-07-2019

Earliest version that supports is 1.13.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

p0wertje · ‎08-09-2019

Found the problem (thx to vmware support!)

I made the <nodename> ncp/node_name and ncp_cluster on the VM name instead of the interface/logical switch.

Because of this the hyperbus was unhealthy.

On the esx host in 'nsxcli' you can type 'get hyperbus connection info'

this showed nothing.

That was esaclty the reason why i got a connection refused.

After changing the tagging the hyperbus was healthy and everything works.

xxxxx.infra.test> get hyperbus connection info

VIFID Connection Status

198c008e-dc61-406e-bf75-688c4dae0a24 169.254.1.12:2345 HEALTHY

4601b498-1ee7-4232-8ead-a70663a221e1 169.254.1.11:2345 HEALTHY

Also the nsx-agent-node gives a healthy now:

kubectl exec -n nsx-system -it nsx-node-agent-hclhs nsxcli
Defaulting container name to nsx-node-agent.
Use 'kubectl describe pod/nsx-node-agent-hclhs -n nsx-system' to see all of the containers in this pod.
NSX CLI (Node Agent). Press ? for command list or enter: help
k8s-node01> get node-agent-hyperbus status
HyperBus status: Healthy

k8s-node01>

---------------------------------------------------------------------------------------------------------

Was it helpful? Let us know by completing this short survey here.

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved

All

nsx-t and kubernetes