All Posts

If you could elaborate what the issue with the service account was and what resolved it, might be helpful to other users. Thanks. Aashima
Refreshed it a couple of times during a period of 2 days that cluster existed, always showed wrong number of nodes. Cluster is now gone so I will report if it happens again on another cluster.
Hi, I am aware it is using cluster API in the background, this is why I am interested on how to use it directly. Could you please provide a guide on how to execute steps 2 to 4? Goal will be to ... See more...
Hi, I am aware it is using cluster API in the background, this is why I am interested on how to use it directly. Could you please provide a guide on how to execute steps 2 to 4? Goal will be to automatize everything, ideally with API or kubectl and second choice with a CLI.  - Create a new cluster including Control Plane and Worker Pools  - Create new worker nodes pools  - Edit Control Plane (like number of nodes)  - Edit a worker node pool I am wondering if cluster API could be used in all scenario or if it is not possible for the first step.(creating new cluster).  
Please try refresh the UI to see it reflects the right value. We will keep an eye on this and double check 
Deployments are failing with the following error: Aug 08 19:44:20 photon bash[2574]: {"level":"info","ts":1659987860.6753228,"caller":"repair/heartbeat.go:102","msg":"updating component [{error... See more...
Deployments are failing with the following error: Aug 08 19:44:20 photon bash[2574]: {"level":"info","ts":1659987860.6753228,"caller":"repair/heartbeat.go:102","msg":"updating component [{errorSet {RecoveryError 0001-01-01 00:00:00 +0000 UTC urn:vcloud:entity:vmware:capvcdCluster:8319ce29-cba5-4f65-8f20-46d0207cb386  map[]} true}] in RDE: [test-1(urn:vcloud:entity:vmware:capvcdCluster:8319ce29-cba5-4f65-8f20-46d0207cb386)]"} Aug 08 19:44:20 photon bash[2574]: {"level":"error","ts":1659987860.6754072,"caller":"app/main.go:584","msg":"error creating cluster [test-1(urn:vcloud:entity:vmware:capvcdCluster:8319ce29-cba5-4f65-8f20-46d0207cb386)] : [error while bootstrapping the machine [test-1/EPHEMERAL_TEMP_VM]; unable to wait for post customization phase [guestinfo.cloudinit.kind.cluster.creation.status] : [invalid postcustomization phase: [failed] for key [guestinfo.cloudinit.kind.cluster.creation.status] for vm [EPHEMERAL_TEMP_VM]]]","stacktrace":"main.processRDE\n\t/app/main.go:584"}
> failed to write /var/lib/cloud/data/ > failed to write  /var/lib/cloud/instance/boot-finished Thanks for the screeshots. We are looking into this. It is not a blocker though, not causing issues... See more...
> failed to write /var/lib/cloud/data/ > failed to write  /var/lib/cloud/instance/boot-finished Thanks for the screeshots. We are looking into this. It is not a blocker though, not causing issues.
> After successful cluster provisioning (State: Ready) I can not login to any node using SSH public key I provided during setup of cluster We will verify this is a real bug and fix it before GA. Tha... See more...
> After successful cluster provisioning (State: Ready) I can not login to any node using SSH public key I provided during setup of cluster We will verify this is a real bug and fix it before GA. Thanks for reporting.
Deployment is failing with following shown in journalctl output: Aug 08 19:43:38 photon bash[2574]: {"level":"info","ts":1659987818.9050834,"caller":"utils/rdeUtils.go:210","msg":"Assigning value [E... See more...
Deployment is failing with following shown in journalctl output: Aug 08 19:43:38 photon bash[2574]: {"level":"info","ts":1659987818.9050834,"caller":"utils/rdeUtils.go:210","msg":"Assigning value [Elpased time waiting for postcustomization status [guestinfo.cloudinit.kind.cluster.creation.status] to complete [0.240771] seconds] to key [HeartbeatString] in [types.VCDKEStatus]"} Aug 08 19:43:38 photon bash[2574]: {"level":"info","ts":1659987818.939308,"caller":"utils/rdeUtils.go:313","msg":"successfully updated defined entity with ID [urn:vcloud:entity:vmware:capvcdCluster:8319ce29-cba5-4f65-8f20-46d0207cb386]"} Aug 08 19:43:49 photon bash[2574]: {"level":"error","ts":1659987829.103915,"caller":"cluster/clusterManager.go:238","msg":"error waiting for creation of cluster [test-1(urn:vcloud:entity:vmware:capvcdCluster:8319ce29-cba5-4f65-8f20-46d0207cb386)]: [error while bootstrapping the machine [test-1/EPHEMERAL_TEMP_VM]; unable to wait for post customization phase [guestinfo.cloudinit.kind.cluster.creation.status] : [invalid postcustomization phase: [failed] for key [guestinfo.cloudinit.kind.cluster.creation.status] for vm [EPHEMERAL_TEMP_VM]]]","stacktrace":"gitlab.eng.vmware.com/core-build/vcd-k8s-provider/src/cluster.CreateCluster\n\t/app/src/cluster/clusterManager.go:238\nmain.processRDE\n\t/app/main.go:566"} Aug 08 19:43:49 photon bash[2574]: {"level":"info","ts":1659987829.1040707,"caller":"repair/heartbeat.go:102","msg":"updating component [{errorSet {ScriptExecutionError 2022-08-08 19:43:49.103954454 +0000 UTC m=+6717.591647529 urn:vcloud:vm:4fd80e53-7307-43fc-b74f-274f33aabb5e map[Detailed Error:[error while bootstrapping the machine [test-1/EPHEMERAL_TEMP_VM]; unable to wait for post customization phase [guestinfo.cloudinit.kind.cluster.creation.status] : [invalid postcustomization phase: [failed] for key [guestinfo.cloudinit.kind.cluster.creation.status] for vm [EPHEMERAL_TEMP_VM]]] during cluster creation]} false}] in RDE: [test-1(urn:vcloud:entity:vmware:capvcdCluster:8319ce29-cba5-4f65-8f20-46d0207cb386)]"}
Yes, troubleshooting is much more difficult without the equivalent to rollbackOnFailure: false.
Internally the beta uses ClusterAPI. @ccalvetbeta can you describe what exactly you want to do? Do you mean the following: 1. Create a cluster using CSE 4.0 2. Get the capi yaml 3. change the numb... See more...
Internally the beta uses ClusterAPI. @ccalvetbeta can you describe what exactly you want to do? Do you mean the following: 1. Create a cluster using CSE 4.0 2. Get the capi yaml 3. change the number of nodes in the capi yaml 4. Apply this yaml using kubectl   Is (4) what you mean?  
After a successful deployment in the gui, is it possible to do the same with cluster API? Goal would be to automatize all deployment. Like in Cluster API Provider for VMware Cloud Director - VMware ... See more...
After a successful deployment in the gui, is it possible to do the same with cluster API? Goal would be to automatize all deployment. Like in Cluster API Provider for VMware Cloud Director - VMware Cloud Provider Blog For example could cluster API be used to edit number of nodes, create new worker nodes etc.?
I have tried CSE.next on multisite setup, where I have two VCD sites paired on system and organization level. So, sites are rs-bg-1-ec and rs-bg-2-ec and I can see VDCs and VMs from both sites in VC... See more...
I have tried CSE.next on multisite setup, where I have two VCD sites paired on system and organization level. So, sites are rs-bg-1-ec and rs-bg-2-ec and I can see VDCs and VMs from both sites in VCD. Also, in CSE UI plugin on rs-bg-2-ec site I can see k8s clusters from both sites: If I click on cse4-rs-bg-1-test-1 link, it will open rs-bg-1-ec site in new browser tab, but it will not accept my current SSO login and I will need to login again to manage my cluster. Annoyance, but bearable. The real problem is, if I login to rs-bg-2-ec site, and go to another site by SSO, and then open CSE UI plugin, it will not be able to show any clusters, it will display message "Error: Failed to fetch Kubernetes clusters" There is 401 Unauthorized HTTP request to rs-bg-1-ec site's /api/session url and there is a "CSE UI: Error fetching sites" message in browser console.   Note: Similar problems were noticed in CSE 3.1 when using multisite setup.    
The new capability to create multiples nodes for the control plane is good. However how should they be operated? I am thinking of etcd https://kubernetes.io/docs/tasks/administer-cluster/configu... See more...
The new capability to create multiples nodes for the control plane is good. However how should they be operated? I am thinking of etcd https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/ For example: What would be the process to replace a failed ectd member? Removing a failed member doesn't seem possible if they are managed by Tanzu. I didn't see an option in the gui to delete a specific node. Will their be an option to backup etcd directly from the GUI or via CLI?
I accidentally deleted load balancer rule for 6443 port, and was hoping that CSE will re-create it because it is needed for cluster to work correctly, but that does not happen. Is that load balancer... See more...
I accidentally deleted load balancer rule for 6443 port, and was hoping that CSE will re-create it because it is needed for cluster to work correctly, but that does not happen. Is that load balancer supposed to be managed by CSE and re-created or modified as needed?
Sometimes it happens that node count is not correct even after cluster progresses to Ready state - different values are shown in CSE plugin than it is visible in kubectl command and it stays like tha... See more...
Sometimes it happens that node count is not correct even after cluster progresses to Ready state - different values are shown in CSE plugin than it is visible in kubectl command and it stays like that even afer 48h  
After provisioning of CSE.next cluster, I got storage class created: kubectl --kubeconfig cse4-2 get storageclasses.storage.k8s.io NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANS... See more...
After provisioning of CSE.next cluster, I got storage class created: kubectl --kubeconfig cse4-2 get storageclasses.storage.k8s.io NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE default-storage-class-1 (default) named-disk.csi.cloud-director.vmware.com Delete Immediate false 5h1m   But it is not visible in CSE plugin, it just shows N/A    
I have tried to access control plane and worker nodes after provisioning (but I am unable to login because SSH public key I entered during provisioning is not configured on nodes which I described in... See more...
I have tried to access control plane and worker nodes after provisioning (but I am unable to login because SSH public key I entered during provisioning is not configured on nodes which I described in another thread) so my host SSH key is recorded in my client for future use. But after rebooting the node, host SSH key is changed: [me@home ~]$ ssh -p 22 root@172.16.172.16 root@172.16.172.16's password: Permission denied, please try again. root@172.16.172.16's password: reboot happens here [me@home ~]$ ssh -p 22 root@172.16.172.16 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed.   Is that supposed to happen or is it because cloud-init is failing to configure nodes? I don't think it is a good thing to change host SSH key every time node is restarted... 
After successful cluster provisioning (State: Ready) I can not login to any node using SSH public key I provided during setup of cluster. Also, I am not able to login to nodes using password that VC... See more...
After successful cluster provisioning (State: Ready) I can not login to any node using SSH public key I provided during setup of cluster. Also, I am not able to login to nodes using password that VCD autogenerated in "Guest OS Customization" (nor with manually specified password). Since I can not login to nodes I can't see logs of cluster provisioning. Only thing I can see are cloud-init errors after boot:   control-plane control-plane-after-reboot worker worker-after-reboot 
At one point, in beta version plugin, I got screen like this (not a real screenshot from that time, this is a mockup): I was able to proceed dialog with first option which gave me CSE.next clust... See more...
At one point, in beta version plugin, I got screen like this (not a real screenshot from that time, this is a mockup): I was able to proceed dialog with first option which gave me CSE.next cluster, but second option showed no available vdc where to provision CSE 3.1. I only saw that screen once, all later attempts showed only first option available, so I am wondering what was going on that time. Or it should work that way, but it doesn't work on my side?    
Finally it has worked without any actions on my side. So it is just slow to start.