Hi, The cluster creation never completed and the cluster never ready. Then vapp and vm are removed and recreated later in loop. Result of a "cat /var/log/cloud-final.err" Note: I am not sure ...
See more...
Hi, The cluster creation never completed and the cluster never ready. Then vapp and vm are removed and recreated later in loop. Result of a "cat /var/log/cloud-final.err" Note: I am not sure if we can login as root with ssh, so screenshot is from the console All pods seems ok. And from the journal i don't see relevant error But few errors earlier of type "412 Precondition Failed" Seems the latest error in event then is associated to the event of deleting vapp and load balancer Any suggestion of what should be the next step in troubleshooting?
Hi, Would it be possible to add pre-requisite health check in the gui? (similar to Distributed Switch health check) It would avoid trying to fix a deployment not working. There could be multiple ...
See more...
Hi, Would it be possible to add pre-requisite health check in the gui? (similar to Distributed Switch health check) It would avoid trying to fix a deployment not working. There could be multiple level of health check: Infrastructure: Confirm that all objects have been properly created in the API. (Maybe the logged in user would not have the right to see such settings so this test should run under a different account) User: Confirm the logged in user has all prerequisites permissions. Then the user select in which organization network to simulate a deployment: Confirm network has an ip pool configured with enough free IP address Confirm DNS are configured. Confirm the EDGE is properly configured with access to a load balancer, Confirm Enough external IP addresse available Confirm enough VIP available in the edge. Confirm enough capacity (CPU/Memory/Storage) Confirm sizing policies created. Then deploy a test VM in this network (similar to how the ephemeral VM would be created) Confirm that the test VM has access to DNS server That the VM has access to all URLS needed. (List should be provided in documentation, not all environment can provide full internet access) That the VM has access to Cloud Director (and eventually if certificates are trusted) List non-exhaustive If all test results passed/failed are visible in the gui, it would be easy to pinpoint wrong settings and fix them before even trying to deploy a cluster. Regards,
Hello, Okay, so you're seeing the known issue where some non-TKG OVA vapp templates are being read as TKG OVA vapp templates by UI plugin. The UI Plugin v4.0.102 that I linked to above has that iss...
See more...
Hello, Okay, so you're seeing the known issue where some non-TKG OVA vapp templates are being read as TKG OVA vapp templates by UI plugin. The UI Plugin v4.0.102 that I linked to above has that issue fixed, and that issue will be fixed in GA as well. Thank you!
Hello! I can help with the error you're seeing in the TKG OVA datagrid. Your CSE process seems to be running fine. I tried accessing your VCD testbed at 172.21.19.51, but seems it's not accessible. ...
See more...
Hello! I can help with the error you're seeing in the TKG OVA datagrid. Your CSE process seems to be running fine. I tried accessing your VCD testbed at 172.21.19.51, but seems it's not accessible. There's a few things we can try here. Let me know if you'd like to schedule a zoom call to go over this: 1. There is a known issue where non-TKG OVA vapp templates that are visible to the current user are being read by UI plugin as TKG OVAs. This can cause the error you're seeing. To quickly test this, you can either delete all non-TKG OVA vapp templates, or create a user in an org where only the TKG OVA vapp templates are visible (you can verify this with a Postman GET request to "https://{{host}}/api/query?type=vAppTemplate&format=records&page=1&pageSize=20&filterEncoded=true&sortAsc=name&links=true" 2. Alternatively, I have a test UI plugin build v4.0.102 (beta build is v4.0.101) here where this specific known issue is fixed: https://artifactory.eng.vmware.com/ui/native/cloud-director-solutions-generic-local/container-ui-plugin/4479576/ . Can you download the zip file, go to provider VCD -> customize portal -> upload the zip file -> disable container ui plugin v4.0.101 -> enable and publish container ui plugin v4.0.102 -> refresh browser -> try the workflow again to see if the error is gone. Please let me know how it goes or if you'd like to schedule a zoom session
And here is the post &response for the RDE instance. It got a 200 OK, but the urn of the item seems weird.. Instead of "urn:xxx:xxx" it shows like "urn%3Axxx%3Axxx"
I found that the cse process is running, but it seems waiting for the RDEs. And in the cse.log, only the following line is output "querying list of RDEs for processing"
I finished all workloads described in the provider workflow docs. When trying to deploy the a tkg cluster via GUI, I found the error on the TKG ova template page. The template catalog is published ...
See more...
I finished all workloads described in the provider workflow docs. When trying to deploy the a tkg cluster via GUI, I found the error on the TKG ova template page. The template catalog is published and shared with the user tenant. I also tried reboot cse, but still not work
Thanks for the reply. Unfortunately if I stop CSE service too early, maybe the task that trigger the issue will not be executed in the VM making the VM useless for troubleshooting. This future feat...
See more...
Thanks for the reply. Unfortunately if I stop CSE service too early, maybe the task that trigger the issue will not be executed in the VM making the VM useless for troubleshooting. This future feature is really a must have for the GA,
Just a quick reminder. If you need to stop the CSE service vApp, please use Power On, force recustomization to restart the VM inside CSE service vApp. OtherWise, the vcd-ke server wouldn't restart. (...
See more...
Just a quick reminder. If you need to stop the CSE service vApp, please use Power On, force recustomization to restart the VM inside CSE service vApp. OtherWise, the vcd-ke server wouldn't restart. (Please feel free to use systemctl status cse to check whether CSE server is alive).
>I am not sure why its trying to find a vApp with the name tkgcl01_ephemeral_vapp. We have a common function that is meant for user intitiated Delete Cluster operation as wells as for clean-up on ...
See more...
>I am not sure why its trying to find a vApp with the name tkgcl01_ephemeral_vapp. We have a common function that is meant for user intitiated Delete Cluster operation as wells as for clean-up on error while creating the cluster. As you mentioned, there is no tkgcl01_ephemeral_vapp created in your use case. It is just a info level message that can be ignored.
At this point, you may want to stop the CSE service to get to the ephemeral vm before it is getting deleted and recreated as part of retry. If you are able to successfully login to ephemeral vm, p...
See more...
At this point, you may want to stop the CSE service to get to the ephemeral vm before it is getting deleted and recreated as part of retry. If you are able to successfully login to ephemeral vm, please do the following. 1. export KUBECONFIG=/.kube/config 2. kubectl get pods -A for pods that are stuck or having issues 3. kubectl logs for individual log In addition if you can access: https://github.com/vmware/cloud-provider-for-cloud-director/blob/main/scripts/generate-k8s-log-bundle.sh Please run the above script after setting KUBECONFIG on ephemeral vm and upload the log bundle. We can take a look for further investigation.
We have this feature in the upcoming GA release that prevents the deletion of EPHEMERAL_TEMP_VM But as a workaround, you may either want to stop the CSE service where the OVA is installed. Powering ...
See more...
We have this feature in the upcoming GA release that prevents the deletion of EPHEMERAL_TEMP_VM But as a workaround, you may either want to stop the CSE service where the OVA is installed. Powering off the OVA vm also brings down the service. Either of them is required to stop EPHEMERAL_TEMP_VM getting deleted. But, the above step should be done before the timeout happens. If you get to notice that it is looping on any existing phase. But timing this (stopping the service) requires little judgement on how fast the cluster creation happen.
Tried with 10.3.3.19780585, same thing happened, which lead me to check logs again and I found "Missing right to use ExtraConfig guestinfo.userdata.encoding." Then I figured out that I needed "Pre...
See more...
Tried with 10.3.3.19780585, same thing happened, which lead me to check logs again and I found "Missing right to use ExtraConfig guestinfo.userdata.encoding." Then I figured out that I needed "Preserve All ExtraConfig Elements During OVF Import and Export" in my "Default rights bundle", not (only) in "Organization admin" role. EPHEMERAL_TEMP_VM now boots (but cluster is in state Reconciling, so I'm checking it out).
Issue: The vapp is created, then the VM "EPHEMERAL_TEMP_VM". Later the VAPP and VM are removed and the cycle is then repeated. I would like to analyse all logs in the VM. But when it is deleted i...
See more...
Issue: The vapp is created, then the VM "EPHEMERAL_TEMP_VM". Later the VAPP and VM are removed and the cycle is then repeated. I would like to analyse all logs in the VM. But when it is deleted i may have missed the latest. Is there an equivalent to the setting "rollback:false" in the legacy CSE?
This is my first try to provision cse.next k8s cluster, and it fails while creating EPHEMERAL_TEMP_VM with ScriptInitError and later with EphemeralVMError, and It doesn't even start. VMware C...
See more...
This is my first try to provision cse.next k8s cluster, and it fails while creating EPHEMERAL_TEMP_VM with ScriptInitError and later with EphemeralVMError, and It doesn't even start. VMware Cloud Director version: 10.3.2.19473806 I have attached whole cse log from journalctl -lu cse
Hi, So far I have not managed to create a cluster. Creating CAPVCD cluster task stuck at 1%. I am trying to troubleshoot but it is difficult to identify what is wrong from the task/logs without kno...
See more...
Hi, So far I have not managed to create a cluster. Creating CAPVCD cluster task stuck at 1%. I am trying to troubleshoot but it is difficult to identify what is wrong from the task/logs without knowing how it is supposed to look like. Would it be possible to provide the following information from a working environment? First step, deploy new clusters (management and workload) from the the new beta plugin and when they are ready: Export all related cloud director tasks Export all related cloud director event. And provide a copy of logs: cse.log journalctl -axel -u cse Such information would be very helpful for troubleshooting. Would it be also possible to provide step by step chart of what is happening when creating a new cluster? Addition: Would it be possible to add the list of all URL that need to be allowed from the vm/nodes?
hello, Not sure i posted a message with logs which is not visible. Its getting marked as spam. I checked logs using command journalctl -u cse and could see [ENF] Entity not found message. I am not ...
See more...
hello, Not sure i posted a message with logs which is not visible. Its getting marked as spam. I checked logs using command journalctl -u cse and could see [ENF] Entity not found message. I am not sure why its trying to find a vApp with the name tkgcl01_ephemeral_vapp. I dont see any vapp with this name. vApp name is tkgcl01 which is the name i used while creating tkg cluster