Re: vSphere with Tanzu (TKGs) worker nodes are hav...

apatil1 · ‎07-28-2022

vCenter version: 7.0.3.00800 Build: 20150588
k8s: v1.22.9+vmware.1 VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11
Cluster size: Control plane: 3 best-effort-2xlarge, Workers: 10 best-effort-2xlarge

When I start deploying applications like elasticsearch-rally, cassandra, fio, vdbench, pgbench most of the nodes come under disk pressure evicting the pods.

I see following events on the nodes:

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Warning FreeDiskSpaceFailed 41m kubelet failed to garbage collect required amount of images. Wanted to free 729588531 bytes, but freed 0 bytes

Warning FreeDiskSpaceFailed 26m kubelet failed to garbage collect required amount of images. Wanted to free 687059763 bytes, but freed 0 bytes

Warning ImageGCFailed 21m kubelet failed to garbage collect required amount of images. Wanted to free 703636275 bytes, but freed 0 bytes

Warning FreeDiskSpaceFailed 21m kubelet failed to garbage collect required amount of images. Wanted to free 703636275 bytes, but freed 0 bytes

Warning FreeDiskSpaceFailed 16m kubelet failed to garbage collect required amount of images. Wanted to free 703996723 bytes, but freed 0 bytes

Warning ImageGCFailed 16m kubelet failed to garbage collect required amount of images. Wanted to free 703996723 bytes, but freed 0 bytes

Normal NodeHasDiskPressure 12m (x9 over 15h) kubelet Node tkgs-cluster-1-test-nodes-wtzl5-8d6d65695-2n2pp status is now: NodeHasDiskPressure

Warning FreeDiskSpaceFailed 11m kubelet failed to garbage collect required amount of images. Wanted to free 3352056627 bytes, but freed 0 bytes

Warning ImageGCFailed 11m kubelet failed to garbage collect required amount of images. Wanted to free 3352056627 bytes, but freed 0 bytes

Warning EvictionThresholdMet 7m41s (x29 over 15h) kubelet Attempting to reclaim ephemeral-storage

By default the root partition disk size is 16 GB. Is there any way to deploy the vSphere with Tanzu (TKGs) cluster with larger root partition.

I am able to reproduce the issue consistently in last 3 releases of vSphere with Tanzu including the recent one.

DCasota · ‎07-28-2022

Hi,

In https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-B1034373-8C38-4FE2-9517... there is a yaml sample ending with

workers:
count: 3
class: best-effort-medium
storageClass: vwt-storage-policy
volumes:
- name: containerd
mountPath: /var/lib/containerd
capacity:
storage: 16Gi

Modifying 16Gi to let's say 64Gi might help. I didn't test it and it is not my finding. The original answer was published in another thread.

DCasota · ‎08-03-2022

Hi @McDonald43452 ,

Can you clarify the issue?

Is it the same as @apatil1 described that 16GB is low for initial capacity, or if there is a recipe to enlarge the 16GB on-the-fly, or how to provision with an initial capacity of e.g. 64GB ?

Your clarification helps others in the community to contribute to the issue(s). Kind regards, Daniel

apatil1 · ‎08-03-2022

Thanks @DCasota . This solution helped me adding an additional disk and mount /var/lib/containerd on it,. This resolved the disk pressure issue that I was hitting.

bengandon · ‎01-15-2024

To all humans reading this thread: please report the 4 posts above with spam links to moderators, so that they get removed. As a community we may not accept such IA-generated content, aiming at publishing SPAM links in the context of black-hat Search Engine Optimisation.

Randye · ‎04-07-2024

Disk pressure in Kubernetes nodes occurs when the available storage space falls below a certain threshold, leading to eviction of pods to reclaim resources. The events you're seeing indicate that the kubelet is unable to free up the required amount of disk space through garbage collection, which is an automated process to clean up unused images and containers. Pay by plate ma

All

vSphere with Tanzu (TKGs) worker nodes are having disk pressure