Kubernetes Pods (CNFs) - Evictions due to Node Disk Pressure

3 Kudos

Why do the PODs go into Eviction? Eviction is a process where a Pod assigned to a node is asked for Termination. They are terminated, usually the result of not having enough resources. So Kubernetes will evict a certain number of pods from the node to ensure that there are enough resources on the node. Also, Kubernetes constantly checks resources and evicts Pods if needed, by a process called Node pressure eviction.

Symptoms:

When running $kubectl get events -n namespace, the following errors are observed:

Failed to garbage collect required amount of images
Disk-pressure warnings for the associated namespace.

The following errors are observed when running $kubectl describe pod -n namespace podname

NodeHasDiskPressure
Attempting to reclaim ephemeral-storage

Purpose:

The purpose of this article is to provide troubleshooting guidelines for scenarios where Kubernetes pods go into an evicted state due to disk pressure.

Cause:

In Kubernetes, Pods can be evicted from a Node due to insufficient resources.

In additional to terminating the Pod, whenever a node experiences disk pressure, a process called Node-pressure Eviction can activate, which utilizes Kubelet to perform garbage collection and remove dormant Kubernetes objects from utilizing resources.

When a pod is terminated, Kubernetes can generate several core* temporary files, which if not cleaned up properly, can lead to disk exhaustion.

While this process is automated, manual intervention may be required.

Procedure 1 (Clean up corefiles)

SSH into the worker node as the root user.
Obtain the file system disk usage by running the following command:

$df -kh
Confirm that the root (/) partition is highly utilized, e.g. over 85% full.
Navigate to the /data/storage/corefiles directory

$cd /data/storage/corefiles
Obtain the total size of the directory by running the following command:

$du -s -h

Note: This value is the amount of space that will be cleaned up.
List the files to confirm there are corefiles present.

$ls -lrth
Run the following command to remove all corefiles:

$rm -rf core*
Review the pod status by running the following command:

$kubectl get pods -A -o wide | grep nodename

Note: Replace nodename in example above with valid nodename.

Confirm the Pods are in a running state as expected

Note: If issue is not resolved, please proceed to Procedure 2.

Procedure 2 (Clean up and re-instantiate CNF(s))

SSH to the worker node as root user
List the containers using the following command to verify if DU container is running

$crictl ps -a

3. List the images by running the following command.

$crictl images

4. Confirm the DU pod image listed. If it is present, stop & kill the DU container as we are encountering issues with DU.

5. Stop and kill the DU container immediately using the below command.

$crictl stop container_Id ; crictl rm container_Id

Note: Replace container_ID in example above with valid containerId imageId.

Note: Once you stop a DU, the DU will go into an exited state which will prompt a new DU to get created. For this reason, we need to run these two commands in parallel.

6. Once the DU container has been terminated, terminate the CNF through the TCA UI.

7. On successful termination from TCA, run the following command to remove the image:

$crictl rmi imageId

Note: Replace imageId in example above with valid container imageId.

8. From the Master node, run the following command to confirm that the DU and PTP nodes have been terminated.

$kubectl get pods -A -o wide | grep nodename

Note: Replace nodename in example above with valid nodename(CNF name).

Note: Results should show only kube-system pods.

9. Re-instantiate CNF from TCA. This will pull the fresh image from the registry and deploy the DU & PTP containers

10. Once the instantiation has completed, run the following command to ensure the DU & PTP pods are in a Running state:

$kubectl get pods -A -o wide | grep nodename

Note: Replace nodename in example above with valid nodename(CNF).

If the multiple evicted pods are still listed, please proceed with Procedure 3.

Procedure 3 (Delete all pods in an Evicted state)

Run the following command to delete any remaining Pods in an Evicted state

$kubectl get pods | grep Evicted | awk ‘{print $1}’ | xargs kubectl delete pod

PraveenBatta1 · ‎01-30-2023

Very useful article. Thank you.

amarpamnani · ‎06-27-2023

This is really informative!

All

Kubernetes Pods (CNFs) - Evictions due to Node Disk Pressure

Kubernetes Pods (CNFs) - Evictions due to Node Disk Pressure

CNF

Pod

Pod Eviction