VMware Modern Apps Community
ajgodzilla1
Contributor
Contributor

TKGI 1.14 Enabling snapshot support

I have deployed TKGI version 1.14.0 and per documentation, this should have CSI drivers v2.5.1 that have support for volume snapshots.. I am using the automatically  installed CSI drivers and not the manually installed as recommended.  

I am having issues when trying to run the deploy-csi-snapshot-components.sh  as indicated at this site: https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-s...

It seems to fail because snapshot controller that it is trying to deploy on the master node is failing because it could not find a node that matched the nodeSelector which in this case is the master node.

I noticed that when I run "kubectl get nodes", only worker nodes show up and not master node.  Is this hidden? OR is there a configuration file or something I have to modify to have the master node be listed in "kubectl get nodes".   Because it can not find the master node this script is failing.

Another thing I noticed in the script is that it looking for the vsphere-csi-controller pod within the vmware-system-csi and when I run "kubectl get pods -n vmware-system-csi" there are no resources running.  Again I am using the "automatically installed CSI driver".  Is there something I am missing in the deployment of the cluster and TKG-I that is not properly deploying these pods as defined in the vsphere-csi-driver.yaml file.   I tried to manually install the CSI drivers, however, the CSI driver also expects the master node to be listed when doing a "kubectl get nodes".

Any advise would be most helpful...

Thanks,

AJ

 

0 Kudos
3 Replies
euclidsunyong
VMware Employee
VMware Employee

TKGI integrated CSI driver does not included the snapshot services yet. it is not listed as a support feature. we plan to add snapshot capability in next TKGI minor version.

0 Kudos
ajgodzilla1
Contributor
Contributor

Much thanks for that information.  I will look forward to that version once released.  However, I got it working by doing the following:

  1. Deployed the cluster WITHOUT the automatically installed CSI driver even though it is recommended in the TKGI 1.14 docs. Did this by setting NO under Storage of the TKGI tile in opsman.
  2. Manually installed the csidriver as recommended using TKGI 1.12 instructions https://docs.pivotal.io/tkgi/1-12/vsphere-cns-manual.html .Thus modified the 2.5.2 vsphere-csi-driver.yaml  as mentioned in the above docs which is to  modify the vSphere CSI Driver vsphere-csi-driver.yaml configuration file as follows:

    a. Remove the following nodeselector from the vsphere-csi-controller Deployment section:

  nodeSelector:
    node-role.kubernetes.io/master: ""

     b. In the vsphere-csi-node DaemonSet section, replace all occurrences of /var/lib/kubelet with /var/vcap/data/kubelet.

 

 

3. For the snapshot deployment script, I also modified the 2.5.2 deployment-csi-snapshot-components.sh to remove the nodeSelector when deploying the snapshot-controller and snapshot-validation-deployment sections.  I had already done this before, however, as I mentioned to you over the call,  it worked until it started looking for the vsphere-csi-controller which was not present when using the automatically installed CSIDriver. Since the manually installed 2.5.2 CSI driver now has vsphere-csi-controller pod running, this script now completes.  Diff is shown below.

$ diff 2.5.2deploy-csi-snapshot-components.sh.orig 2.5.2deploy-csi-snapshot-components.sh

167c167

<       kubectl patch deployment -n kube-system snapshot-controller --patch '{"spec": {"template": {"spec": {"nodeSelector": {"node-role.kubernetes.io/master": ""}, "tolerations": [{"key":"node-role.kubernetes.io/master","operator":"Exists", "effect":"NoSchedule"}]}}}}'

---

>       kubectl patch deployment -n kube-system snapshot-controller --patch '{"spec": {"template": {"spec": {"tolerations": [{"key":"node-role.kubernetes.io/master","operator":"Exists", "effect":"NoSchedule"}]}}}}'

228c228

<       kubectl patch deployment -n kube-system snapshot-validation-deployment --patch '{"spec": {"template": {"spec": {"nodeSelector": {"node-role.kubernetes.io/master": ""}, "tolerations": [{"key":"node-role.kubernetes.io/master","operator":"Exists", "effect":"NoSchedule"}]}}}}'

---

>       kubectl patch deployment -n kube-system snapshot-validation-deployment --patch '{"spec": {"template": {"spec": {"tolerations": [{"key":"node-role.kubernetes.io/master","operator":"Exists", "effect":"NoSchedule"}]}}}}'

 

As you can see below the pods that were deployed after running the csi driver script and the snapshot deployment script.  The number of replicas can be modified in the csi driver yaml to only run 1 replica if you only have 1 control-plane (master).  

 

$ kubectl get pods --all-namespaces

NAMESPACE           NAME                                                              READY   STATUS      RESTARTS      AGE

default             csisnaps-restore-pod                                              1/1     Running     0             27m

default             example-vanilla-block-pod                                         1/1     Running     0             56m

kube-system         antrea-agent-54b9r                                                2/2     Running     0             67m

kube-system         antrea-agent-m7bfl                                                2/2     Running     0             67m

kube-system         antrea-agent-rqfg4                                                2/2     Running     0             71m

kube-system         antrea-controller-5c788594b9-5s689                                1/1     Running     0             73m

kube-system         coredns-787c57488d-4vn2f                                          1/1     Running     0             64m

kube-system         coredns-787c57488d-blfqn                                          1/1     Running     0             64m

kube-system         coredns-787c57488d-nghdq                                          1/1     Running     0             64m

kube-system         konnectivity-agent-3cb6ff69-fb01-4d81-8593-e678b2293266-7c7jdlp   1/1     Running     0             73m

kube-system         konnectivity-agent-3cb6ff69-fb01-4d81-8593-e678b2293266-7cqbjpw   1/1     Running     0             73m

kube-system         metrics-server-66c5bff789-w6pz6                                   1/1     Running     0             64m

kube-system         snapshot-controller-7f5d798964-6g5ql                              1/1     Running     0             54m

kube-system         snapshot-controller-7f5d798964-qs5c2                              1/1     Running     0             54m

kube-system         snapshot-validation-deployment-d448fb598-lhgkv                    1/1     Running     0             45m

kube-system         snapshot-validation-deployment-d448fb598-ngsrj                    1/1     Running     0             45m

kube-system         snapshot-validation-deployment-d448fb598-zwlff                    1/1     Running     0             45m

pks-system          cert-generator-91882e178d2f89da53a2344e3b2eee692455c956-dmblp     0/1     Completed   0             64m

pks-system          event-controller-795755d67-f7d47                                  2/2     Running     0             64m

pks-system          fluent-bit-l9wt7                                                  2/2     Running     0             63m

pks-system          fluent-bit-rhxl7                                                  2/2     Running     0             63m

pks-system          fluent-bit-ttm5z                                                  2/2     Running     0             63m

pks-system          metric-controller-669f9bc57b-295g2                                1/1     Running     0             64m

pks-system          observability-manager-64cbd6c45d-f6qxc                            1/1     Running     0             64m

pks-system          sink-controller-77c8c69d54-52dxh                                  1/1     Running     0             64m

pks-system          telegraf-hxgxt                                                    1/1     Running     0             64m

pks-system          telegraf-jht8l                                                    1/1     Running     0             64m

pks-system          telegraf-xhkbw                                                    1/1     Running     0             64m

pks-system          validator-8567d4b66-qfmbg                                         1/1     Running     0             64m

vmware-system-csi   vsphere-csi-controller-7c9df9dfd9-d977z                           7/7     Running     0             57m

vmware-system-csi   vsphere-csi-controller-7c9df9dfd9-wdtjm                           7/7     Running     0             57m

vmware-system-csi   vsphere-csi-controller-7c9df9dfd9-wxgbk                           7/7     Running     0             57m

vmware-system-csi   vsphere-csi-node-gpdc4                                            3/3     Running     2 (57m ago)   57m

vmware-system-csi   vsphere-csi-node-kvtth                                            3/3     Running     2 (57m ago)   57m

vmware-system-csi   vsphere-csi-node-lv9bt                                            3/3     Running     1 (57m ago)   57m

 

 

0 Kudos
Dave34
Enthusiast
Enthusiast

Many thanks for this info

0 Kudos