kastrosmartis
Contributor
Contributor

vSphere with Tanzu - Guest cluster deployment stuck at Control Plane VM

I have master vCenter and 3x nested ESXi with VSAN on which I tried to test Tanzu , VDS switch (not NSX-T), HAproxy as LB.

All MGMT are on existing network, deployed 2x new VLAN, one for Frontend and one for Workload.

Managed to deploy SupervisorCluster up and running, Namespaces. When I try to deploy Guest cluster with simple .yaml I got stuck.

Control Plane VM gets deployed, powered on, HAproxy .cfg populated with frontend/backend IPs, IP set up (unpingable) but then stuck before Wokers deployment.

 

kubectl get events -w
LAST SEEN TYPE REASON OBJECT MESSAGE
0s Warning ReconcileFailure wcpcluster/simple unexpected error while reconciling control plane endpoint for simple: failed to reconcile loadbalanced endpoint for WCPCluster tanzu-ns-01/simple: failed to get control plane endpoint for Cluster tanzu-ns-01/simple: VirtualMachineService LB does not yet have VIP assigned: VirtualMachineService LoadBalancer does not have any Ingresses
0s Normal CreateVMServiceSuccess virtualmachineservice/simple-control-plane-service CreateVMService success
0s Warning ReconcileFailure wcpcluster/simple unexpected error while reconciling control plane endpoint for simple: failed to reconcile loadbalanced endpoint for WCPCluster tanzu-ns-01/simple: failed to get control plane endpoint for Cluster tanzu-ns-01/simple: VirtualMachineService LB does not yet have VIP assigned: VirtualMachineService LoadBalancer does not have any Ingresses
0s Warning ReconcileFailure wcpcluster/simple unexpected error while reconciling control plane endpoint for simple: failed to reconcile loadbalanced endpoint for WCPCluster tanzu-ns-01/simple: failed to get control plane endpoint for Cluster tanzu-ns-01/simple: VirtualMachineService LB does not yet have VIP assigned: VirtualMachineService LoadBalancer does not have any Ingresses
0s Warning ReconcileFailure wcpcluster/simple unexpected error while reconciling control plane endpoint for simple: failed to reconcile loadbalanced endpoint for WCPCluster tanzu-ns-01/simple: failed to get control plane endpoint for Cluster tanzu-ns-01/simple: VirtualMachineService LB does not yet have VIP assigned: VirtualMachineService LoadBalancer does not have any Ingresses
0s Warning ReconcileFailure wcpcluster/simple unexpected error while reconciling control plane endpoint for simple: failed to reconcile loadbalanced endpoint for WCPCluster tanzu-ns-01/simple: failed to get control plane endpoint for Cluster tanzu-ns-01/simple: VirtualMachineService LB does not yet have VIP assigned: VirtualMachineService LoadBalancer does not have any Ingresses
0s Warning ReconcileFailure wcpcluster/simple unexpected error while reconciling control plane endpoint for simple: failed to reconcile loadbalanced endpoint for WCPCluster tanzu-ns-01/simple: failed to get control plane endpoint for Cluster tanzu-ns-01/simple: VirtualMachineService LB does not yet have VIP assigned: VirtualMachineService LoadBalancer does not have any Ingresses
0s Normal Reconcile gateway/simple-control-plane-service Success
0s Normal Reconcile gateway/simple-control-plane-service Success
0s Normal SuccessfulCreate machinedeployment/simple-workers-kszj2 Created MachineSet "simple-workers-kszj2-d4c6b6f49"
0s Normal SuccessfulCreate machineset/simple-workers-kszj2-d4c6b6f49 Created machine "simple-workers-kszj2-d4c6b6f49-m6sg9"
0s Normal SuccessfulCreate machineset/simple-workers-kszj2-d4c6b6f49 Created machine "simple-workers-kszj2-d4c6b6f49-795l5"
0s Normal SuccessfulCreate machineset/simple-workers-kszj2-d4c6b6f49 Created machine "simple-workers-kszj2-d4c6b6f49-5z5zg"
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
1s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm is not yet created: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm does not have an IP address: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm does not have an IP address: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm does not have an IP address: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Warning ReconcileFailure wcpmachine/simple-control-plane-hfrm7-7twzk vm does not have an IP address: vmware-system-capw-controller-manager/WCPMachine/infrastructure.cluster.vmware.com/v1alpha3/tanzu-ns-01/simple/simple-control-plane-hfrm7-7twzk
0s Normal Reconcile gateway/simple-control-plane-service Success
0s Normal Reconcile gateway/simple-control-plane-service Success
0s Normal Reconcile gateway/simple-control-plane-service Success
0s Normal Reconcile gateway/simple-control-plane-service Success
1s Normal Reconcile gateway/simple-control-plane-service Success

 

Cluster API Status:
API Endpoints:
Host: 172.16.97.194
Port: 6443
Phase: Provisioned
Node Status:
simple-control-plane-rb7r6: pending
simple-workers-kszj2-d4c6b6f49-5z5zg: pending
simple-workers-kszj2-d4c6b6f49-795l5: pending
simple-workers-kszj2-d4c6b6f49-m6sg9: pending
Phase: creating
Vm Status:
simple-control-plane-rb7r6: ready
simple-workers-kszj2-d4c6b6f49-5z5zg: pending
simple-workers-kszj2-d4c6b6f49-795l5: pending
simple-workers-kszj2-d4c6b6f49-m6sg9: pending

 

Any ideas?

 

0 Kudos
6 Replies
ferdis
Hot Shot
Hot Shot

---

0 Kudos
ferdis
Hot Shot
Hot Shot

--

0 Kudos
ferdis
Hot Shot
Hot Shot

Hi, with vSphere w Tanzu using HAProxy Im stucked during TKG cluster deployment. kubectl describe kubeadmcontrolplanes.controlplane.cluster.x-k8s.io tkg-show-01-control-plane is showing me WaitingForKubeadmInit. Anyone? Here is configuration:

ferdis_2-1606634959238.png

HAProxy configuration

 

ferdis_3-1606634970106.png

Frontend CIDR setting in HAProxy

 

Here is Workload Management part

 

ferdis_5-1606635020229.png

 

ferdis_6-1606635029547.png

ferdis_7-1606635049710.png

ferdis_8-1606635060482.png

ferdis_9-1606635068281.png

 

This part is describing state after I initiated TKG guest cluster creation:

 

ferdis_0-1606635449742.png

kubectl describe kubeadmcontrolplanes.controlplane.cluster.x-k8s.io tkg-cluster-1-control-plane

 

ferdis_0-1606636826134.png

ferdis_1-1606637148593.png

kubectl get tanzukubernetescluster,clusters.cluster.x-k8s.io,wcpcluster,machinedeployment.cluster.x-k8s.io,wcpmachinetemplate,machine.cluster.x-k8s.io,wcpmachine,kubeadmconfig,virtualmachinesetresourcepolicies,virtualmachines,virtualmachineservices,configmaps,secrets

ferdis_2-1606635834380.png

from Supervisor VM I can ping Workload Network GW, also can ping HAProxy Workload IP. But cant ping TKG Cluster IP on Workload Network.
SSH to HAProxy IP on Workload Network is refused.

ferdis_0-1606636024126.png

From TKG Control Plane VM I can ping Workload Network GW but cant ping HAProxy Workload IP. Also cant ping Supervisor IP on Workload Network.
SSH to HAProxy IP on Workload Network is not responding at all.

ferdis_0-1606636110202.png

From TKG Control Plane VM on Workload Network I can ping and SSH Supervisor VM IP on Management Network. Also can ping Supervisor Cluster IP On Frontend Network.
Also can ping HAProxy IP on Management Network also can ping HAProxy IP on Frontend Network.

ferdis_0-1606636329881.png

 

 

 

0 Kudos
gabor_AU
Contributor
Contributor

Hi Ferdis,

 

Did you find what the underlying issue is? I'm having exactly the same issue.

0 Kudos
ferdis
Hot Shot
Hot Shot

Gabor hi, by using TCP dump in Supervisor Control Plane VM we have found issue that ping TCP request was going from haproxy IP on Workload Network 192.168.16.3 / 00:50:56:82:aa:51 to Supervisor control plane VM IP on Workload Network 192.168.16.64 / 00:50:56:82:ae:af but Supervisor control plane VM does TCP reply to HAProxy's gateway IP 192.168.10.1 / MAC address 7c:5a:1c:83:27:aa. So assymetric communication there. 

ferdis_0-1616410249569.png

 

Also I have redeployed both HAProxy and Supervisor Cluster using two networks instead of three. I have used same management network and fresh new VLAN for Workload Network. And result was exactly the same as with three networks.

Then VMware HAProxy v0.1.9 was released on Jan 4 with this description:
 
Fixes an issue that causes some routers to avoid routing traffic between VMware Supervisor Control Plane VMs and the HAProxy. On some routers this causes communication issues between the HAProxy and the SV VMs as those routers may not allow for hairpinned traffic. Previously routing rules existed on the SV VMs that required traffic bound for the HAProxy appliance to be routed to the gateway and then back into the subnet. That logic has been changed to route via L2 as of vSphere patch release 7.0.1 P02.
 
 
Then HAProxy v0.1.10 was released but still I had similar issues.
Tags (3)
0 Kudos
gabor_AU
Contributor
Contributor

Hi Ferdis,

 

Thanks for sharing that with me.

I have tried both HAProxy v0.1.9 and HAProxy v0.1.10 but the issue wasn't resolved.

It looks like the issue with the asymmetric routing still exists on 0.1.10 HA proxy and I'm suspecting that it's probably not necessarily only the HA proxy's fault. VMware will probably have to fix the communication from the control plane VM to be working as required.

 

https://techie.cloud/blog/2020/11/26/issues-encountered-when-setting-up-vmware-tanzu/

I came across with this article which describes how to enable asymmetric routing on a pfsense, however I'm not using pfsense and it's not really a lab so I suspect I have to wait until there's further releases from VMware.

Please let me know when you managed to get any further.

In the meantime, I will setup a small nested lab on a spare host to test this in an isolated environment...potentially with a pfsense.

0 Kudos