vCenter version: 7.0.3.00800 Build: 20150588
k8s: v1.22.9+vmware.1 VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11
Cluster size: Control plane: 3 best-effort-2xlarge, Workers: 10 best-effort-2xlarge
When I start deploying applications like elasticsearch-rally, cassandra, fio, vdbench, pgbench most of the nodes come under disk pressure evicting the pods.
I see following events on the nodes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FreeDiskSpaceFailed 41m kubelet failed to garbage collect required amount of images. Wanted to free 729588531 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 26m kubelet failed to garbage collect required amount of images. Wanted to free 687059763 bytes, but freed 0 bytes
Warning ImageGCFailed 21m kubelet failed to garbage collect required amount of images. Wanted to free 703636275 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 21m kubelet failed to garbage collect required amount of images. Wanted to free 703636275 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 16m kubelet failed to garbage collect required amount of images. Wanted to free 703996723 bytes, but freed 0 bytes
Warning ImageGCFailed 16m kubelet failed to garbage collect required amount of images. Wanted to free 703996723 bytes, but freed 0 bytes
Normal NodeHasDiskPressure 12m (x9 over 15h) kubelet Node tkgs-cluster-1-test-nodes-wtzl5-8d6d65695-2n2pp status is now: NodeHasDiskPressure
Warning FreeDiskSpaceFailed 11m kubelet failed to garbage collect required amount of images. Wanted to free 3352056627 bytes, but freed 0 bytes
Warning ImageGCFailed 11m kubelet failed to garbage collect required amount of images. Wanted to free 3352056627 bytes, but freed 0 bytes
Warning EvictionThresholdMet 7m41s (x29 over 15h) kubelet Attempting to reclaim ephemeral-storage
By default the root partition disk size is 16 GB. Is there any way to deploy the vSphere with Tanzu (TKGs) cluster with larger root partition.
I am able to reproduce the issue consistently in last 3 releases of vSphere with Tanzu including the recent one.
Hi,
In https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-B1034373-8C38-4FE2-9517... there is a yaml sample ending with
workers:
count: 3
class: best-effort-medium
storageClass: vwt-storage-policy
volumes:
- name: containerd
mountPath: /var/lib/containerd
capacity:
storage: 16Gi
Modifying 16Gi to let's say 64Gi might help. I didn't test it and it is not my finding. The original answer was published in another thread.
Hi @McDonald43452 ,
Can you clarify the issue?
Is it the same as @apatil1 described that 16GB is low for initial capacity, or if there is a recipe to enlarge the 16GB on-the-fly, or how to provision with an initial capacity of e.g. 64GB ?
Your clarification helps others in the community to contribute to the issue(s). Kind regards, Daniel
Thanks @DCasota . This solution helped me adding an additional disk and mount /var/lib/containerd on it,. This resolved the disk pressure issue that I was hitting.
To all humans reading this thread: please report the 4 posts above with spam links to moderators, so that they get removed. As a community we may not accept such IA-generated content, aiming at publishing SPAM links in the context of black-hat Search Engine Optimisation.
Disk pressure in Kubernetes nodes occurs when the available storage space falls below a certain threshold, leading to eviction of pods to reclaim resources. The events you're seeing indicate that the kubelet is unable to free up the required amount of disk space through garbage collection, which is an automated process to clean up unused images and containers. Pay by plate ma