apatil1
Contributor
Contributor

vSphere with Tanzu (TKGs) worker nodes are having disk pressure

vCenter version: 7.0.3.00800 Build: 20150588
k8s: v1.22.9+vmware.1  VMware Photon OS/Linux   4.19.225-3.ph3   containerd://1.5.11
Cluster size: Control plane: 3 best-effort-2xlarge, Workers: 10 best-effort-2xlarge

When I start deploying applications like elasticsearch-rally, cassandra, fio, vdbench, pgbench most of the nodes come under disk pressure evicting the pods. 

I see following events on the nodes:

Events:

  Type     Reason                 Age                   From     Message

  ----     ------                 ----                  ----     -------

  Warning  FreeDiskSpaceFailed    41m                   kubelet  failed to garbage collect required amount of images. Wanted to free 729588531 bytes, but freed 0 bytes

  Warning  FreeDiskSpaceFailed    26m                   kubelet  failed to garbage collect required amount of images. Wanted to free 687059763 bytes, but freed 0 bytes

  Warning  ImageGCFailed          21m                   kubelet  failed to garbage collect required amount of images. Wanted to free 703636275 bytes, but freed 0 bytes

  Warning  FreeDiskSpaceFailed    21m                   kubelet  failed to garbage collect required amount of images. Wanted to free 703636275 bytes, but freed 0 bytes

  Warning  FreeDiskSpaceFailed    16m                   kubelet  failed to garbage collect required amount of images. Wanted to free 703996723 bytes, but freed 0 bytes

  Warning  ImageGCFailed          16m                   kubelet  failed to garbage collect required amount of images. Wanted to free 703996723 bytes, but freed 0 bytes

  Normal   NodeHasDiskPressure    12m (x9 over 15h)     kubelet  Node tkgs-cluster-1-test-nodes-wtzl5-8d6d65695-2n2pp status is now: NodeHasDiskPressure

  Warning  FreeDiskSpaceFailed    11m                   kubelet  failed to garbage collect required amount of images. Wanted to free 3352056627 bytes, but freed 0 bytes

  Warning  ImageGCFailed          11m                   kubelet  failed to garbage collect required amount of images. Wanted to free 3352056627 bytes, but freed 0 bytes

  Warning  EvictionThresholdMet   7m41s (x29 over 15h)  kubelet  Attempting to reclaim ephemeral-storage

 

By default the root partition disk size is 16 GB. Is there any way to deploy the vSphere with Tanzu (TKGs) cluster with larger root partition.

I am able to reproduce the issue consistently in last 3 releases of vSphere with Tanzu including the recent one. 

0 Kudos
4 Replies
DCasota
Expert
Expert

Hi,

In https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-B1034373-8C38-4FE2-9517... there is a yaml sample ending with

workers:
  count: 3
  class: best-effort-medium
  storageClass: vwt-storage-policy
  volumes:
    - name: containerd
      mountPath: /var/lib/containerd
      capacity:
        storage: 16Gi
 

 

Modifying 16Gi to let's say 64Gi might help. I didn't test it and it is not my finding. The original answer was published in another thread.

0 Kudos
McDonald43452
Contributor
Contributor

It there already an Solution to this problem? I have the same problem.  PaybyPlateMa

0 Kudos
DCasota
Expert
Expert

Hi @McDonald43452 ,

Can you clarify the issue?

Is it the same as @apatil1 described that 16GB is low for initial capacity, or if there is a recipe to enlarge the 16GB on-the-fly, or how to provision with an initial capacity of e.g. 64GB ?

Your clarification helps others in the community to contribute to the issue(s). Kind regards, Daniel

0 Kudos
apatil1
Contributor
Contributor

Thanks @DCasota . This solution helped me adding an additional disk and mount /var/lib/containerd on it,. This resolved the disk pressure issue that I was hitting.