MilanBednar
Contributor
Contributor

TKGS cluster - pod to node communication

Hello,
in our kubernetes cluster (TKGS on vSphere 7 + NSX) we use Prometheus for cluster node metrics collection. Each node has node-exporter pod which publishes its metrics on TCP port 9100. Exporters are deployed as DaemonSet so it listens on nodes IP address. This IP address is not reachable from orther pods inside the cluster. We use Calico as CNI with default settings. Have any ideas how to make this work?

Thanks

Milan

node list:
mbednar@fedora35:~$ kubectl get nodes -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
codenow-application-r5c72-f7b58dfd7-42fnk Ready <none> 18h v1.22.9+vmware.1 10.50.0.53 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11
codenow-application-r5c72-f7b58dfd7-wq8rn Ready <none> 18h v1.22.9+vmware.1 10.50.0.57 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11
codenow-control-plane-hc588 Ready control-plane,master 15d v1.22.9+vmware.1 10.50.0.50 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11
codenow-system-wgcvt-7ff76ff4c9-2vbcf Ready <none> 15d v1.22.9+vmware.1 10.50.0.51 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11
codenow-system-wgcvt-7ff76ff4c9-8lkrf Ready <none> 14d v1.22.9+vmware.1 10.50.0.55 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11
codenow-system-wgcvt-7ff76ff4c9-d26tk Ready <none> 18h v1.22.9+vmware.1 10.50.0.58 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11
codenow-system-wgcvt-7ff76ff4c9-ntmn6 Ready <none> 15d v1.22.9+vmware.1 10.50.0.52 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11
mbednar@fedora35:~$


exporter pods - running on node IP:
mbednar@fedora35:~$ kubectl get pods -n monitoring-system -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-prometheus-operator-kube-p-alertmanager-0 2/2 Running 0 14d 172.16.200.85 codenow-system-wgcvt-7ff76ff4c9-ntmn6 <none> <none>
prometheus-operator-grafana-78d6b945d4-jqgwv 3/3 Running 0 14d 172.16.200.20 codenow-system-wgcvt-7ff76ff4c9-2vbcf <none> <none>
prometheus-operator-kube-p-operator-dcb86bb7c-nzth6 1/1 Running 0 14d 172.16.200.19 codenow-system-wgcvt-7ff76ff4c9-2vbcf <none> <none>
prometheus-operator-kube-state-metrics-f99d547db-hrhbs 1/1 Running 0 14d 172.16.200.83 codenow-system-wgcvt-7ff76ff4c9-ntmn6 <none> <none>
prometheus-operator-prometheus-node-exporter-58qqv 1/1 Running 0 14d 10.50.0.55 codenow-system-wgcvt-7ff76ff4c9-8lkrf <none> <none>
prometheus-operator-prometheus-node-exporter-c5s44 1/1 Running 0 14d 10.50.0.51 codenow-system-wgcvt-7ff76ff4c9-2vbcf <none> <none>
prometheus-operator-prometheus-node-exporter-d9v8w 1/1 Running 0 18h 10.50.0.58 codenow-system-wgcvt-7ff76ff4c9-d26tk <none> <none>
prometheus-operator-prometheus-node-exporter-fq9t7 1/1 Running 0 14d 10.50.0.52 codenow-system-wgcvt-7ff76ff4c9-ntmn6 <none> <none>
prometheus-operator-prometheus-node-exporter-jvn5h 1/1 Running 0 18h 10.50.0.57 codenow-application-r5c72-f7b58dfd7-wq8rn <none> <none>
prometheus-operator-prometheus-node-exporter-kg9k8 1/1 Running 0 14d 10.50.0.50 codenow-control-plane-hc588 <none> <none>
prometheus-operator-prometheus-node-exporter-qzvn6 1/1 Running 0 18h 10.50.0.53 codenow-application-r5c72-f7b58dfd7-42fnk <none> <none>
prometheus-prometheus-operator-kube-p-prometheus-0 4/4 Running 0 14d 172.16.200.29 codenow-system-wgcvt-7ff76ff4c9-2vbcf <none> <none>
thanos-compact-0 1/1 Running 0 14d 172.16.200.21 codenow-system-wgcvt-7ff76ff4c9-2vbcf <none> <none>
thanos-query-6dfb47699c-rq2pt 1/1 Running 0 14d 172.16.200.84 codenow-system-wgcvt-7ff76ff4c9-ntmn6 <none> <none>
thanos-store-0 1/1 Running 0 14d 172.16.200.86 codenow-system-wgcvt-7ff76ff4c9-ntmn6 <none> <none>
mbednar@fedora35:~$

Prometheus server can each only node-exporter on the same node it is running, but not on the other nodes.

TanzuKubernetesCluster definition:
apiVersion: run.tanzu.vmware.com/v1alpha2
kind: TanzuKubernetesCluster

...

network:
cni:
name: calico
services:
cidrBlocks: ["172.16.100.0/24"]
pods:
cidrBlocks: ["172.16.200.0/24"]
serviceDomain: cluster.local

0 Kudos
1 Reply
MilanBednar
Contributor
Contributor

Spoiler
Seems solved - I had to add rule into node iptables and allow incomming traffic to node-exporter port. Just edited /etc/systemd/scripts/ip4save file.