VMware Modern Apps Community
apatil1
Contributor
Contributor

vSphere with Tanzu (TKGs) worker nodes are having disk pressure

vCenter version: 7.0.3.00800 Build: 20150588
k8s: v1.22.9+vmware.1  VMware Photon OS/Linux   4.19.225-3.ph3   containerd://1.5.11
Cluster size: Control plane: 3 best-effort-2xlarge, Workers: 10 best-effort-2xlarge

When I start deploying applications like elasticsearch-rally, cassandra, fio, vdbench, pgbench most of the nodes come under disk pressure evicting the pods. 

I see following events on the nodes:

Events:

  Type     Reason                 Age                   From     Message

  ----     ------                 ----                  ----     -------

  Warning  FreeDiskSpaceFailed    41m                   kubelet  failed to garbage collect required amount of images. Wanted to free 729588531 bytes, but freed 0 bytes

  Warning  FreeDiskSpaceFailed    26m                   kubelet  failed to garbage collect required amount of images. Wanted to free 687059763 bytes, but freed 0 bytes

  Warning  ImageGCFailed          21m                   kubelet  failed to garbage collect required amount of images. Wanted to free 703636275 bytes, but freed 0 bytes

  Warning  FreeDiskSpaceFailed    21m                   kubelet  failed to garbage collect required amount of images. Wanted to free 703636275 bytes, but freed 0 bytes

  Warning  FreeDiskSpaceFailed    16m                   kubelet  failed to garbage collect required amount of images. Wanted to free 703996723 bytes, but freed 0 bytes

  Warning  ImageGCFailed          16m                   kubelet  failed to garbage collect required amount of images. Wanted to free 703996723 bytes, but freed 0 bytes

  Normal   NodeHasDiskPressure    12m (x9 over 15h)     kubelet  Node tkgs-cluster-1-test-nodes-wtzl5-8d6d65695-2n2pp status is now: NodeHasDiskPressure

  Warning  FreeDiskSpaceFailed    11m                   kubelet  failed to garbage collect required amount of images. Wanted to free 3352056627 bytes, but freed 0 bytes

  Warning  ImageGCFailed          11m                   kubelet  failed to garbage collect required amount of images. Wanted to free 3352056627 bytes, but freed 0 bytes

  Warning  EvictionThresholdMet   7m41s (x29 over 15h)  kubelet  Attempting to reclaim ephemeral-storage

 

By default the root partition disk size is 16 GB. Is there any way to deploy the vSphere with Tanzu (TKGs) cluster with larger root partition.

I am able to reproduce the issue consistently in last 3 releases of vSphere with Tanzu including the recent one. 

0 Kudos
5 Replies
DCasota
Expert
Expert

Hi,

In https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-B1034373-8C38-4FE2-9517... there is a yaml sample ending with

workers:
  count: 3
  class: best-effort-medium
  storageClass: vwt-storage-policy
  volumes:
    - name: containerd
      mountPath: /var/lib/containerd
      capacity:
        storage: 16Gi
 

 

Modifying 16Gi to let's say 64Gi might help. I didn't test it and it is not my finding. The original answer was published in another thread.

0 Kudos
DCasota
Expert
Expert

Hi @McDonald43452 ,

Can you clarify the issue?

Is it the same as @apatil1 described that 16GB is low for initial capacity, or if there is a recipe to enlarge the 16GB on-the-fly, or how to provision with an initial capacity of e.g. 64GB ?

Your clarification helps others in the community to contribute to the issue(s). Kind regards, Daniel

0 Kudos
apatil1
Contributor
Contributor

Thanks @DCasota . This solution helped me adding an additional disk and mount /var/lib/containerd on it,. This resolved the disk pressure issue that I was hitting.

bengandon
Contributor
Contributor

To all humans reading this thread: please report the 4 posts above with spam links to moderators, so that they get removed. As a community we may not accept such IA-generated content, aiming at publishing SPAM links in the context of black-hat Search Engine Optimisation.

0 Kudos
Randye
Contributor
Contributor

Disk pressure in Kubernetes nodes occurs when the available storage space falls below a certain threshold, leading to eviction of pods to reclaim resources. The events you're seeing indicate that the kubelet is unable to free up the required amount of disk space through garbage collection, which is an automated process to clean up unused images and containers. Pay by plate ma

0 Kudos