VMware Beta Community
ccalvetbeta
Enthusiast
Enthusiast

Any plan regarding how to operate multi-node etcd cluster?

The new capability to create multiples nodes for the control plane is good.
However how should they be operated?

I am thinking of etcd
https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/

For example:

What would be the process to replace a failed ectd member?
Removing a failed member doesn't seem possible if they are managed by Tanzu.
I didn't see an option in the gui to delete a specific node.

Will their be an option to backup etcd directly from the GUI or via CLI?

0 Kudos
1 Reply
ccalvetbeta
Enthusiast
Enthusiast

I have done some tests regarding this topic with a cluster created with 3 master.
If one control plane node is shutdown from vCenter, "get pods -A" continue to work. (As expected)
If two control plane nodes are shutdown, "get pods -A" doesn't work anymore (Expected)
After restarting one of the control plane node "get pods -A" works again, (Expected)
So the basic functionality of a multi control plane nodes is working.

One issue is that no errors are reported in the events or in status of the cluster from CSE plugin. (Status is "ready")
The only thing visible is at load balancer level  which shows that some endpoints are down and VAPP that is noticing some VMs down.
Would it be possible to add some kinds of "health" in the CSE plugin? (like all control planes node up and running / worker nodes up and running, load balancer associated to management IP deployed etc)

Second issue, I have deleted on purpose one of the control plane VM.
As mentioned above no information are reported from the CSE plugin, it still show "3 nodes".
It doesn't recreate the missing node (no "auto-heal" , which would be the best)
Is there a procedure on how to replace a failed node in such case?

0 Kudos