Reply to Message

View discussion in a popup

Replying to:
cchen2
VMware Employee
VMware Employee

Process stuck during upgrading

I've successfully deployed cluster via CSE 4.0 and tried to upgrade from 1.21.8 to 1.22.9.

After submitting the upgrade request via GUI, the upgrade didn't kick off. Then I restarted the rdeprojector pod, and the upgrade process started.

However, after waiting for over 40 minutes, I found that although the version of control nodes was successfully upgraded to 1.22.9, but the worker nodes stuck in 1.21.8 and an additional worker node was added (before upgrade 2 worker nodes, now 3 worker nodes) and stuck in process status (checking in cluster confiug api)

I suppose that the upgrade process stuck, and want to know what may be the trigger and how to fix.

** Some Advice **

1. It will be better if GUI can show the progress of the entire upgrade process, or at least show if the upgrade process is finished or under processing. In current version, it is confusing and hard to tell.

2. I noticed that the rolling update is done with maxSurge > 0, it means that cluster API will create more temp nodes for updating. For those resource sensitive tenant, it may be better to offer an option to config the maxSurge manually. 

Reply
0 Kudos