Kubernetes Pod: Fix EphemeralStorage Eviction during High Traffic Scaling
Quick Fix Summary
TL;DRDelete evicted pods and scale down the deployment to reduce pressure.
The kubelet evicts pods when the node's ephemeral storage (emptyDir volumes, container logs, image layers) exceeds its limit, often triggered by rapid scaling under load.
Diagnosis & Causes
Recovery Steps
Step 1: Verify and Diagnose the Eviction
Identify the affected node, confirm ephemeral storage pressure, and list evicted pods.
kubectl describe node <node-name> | grep -A 10 -B 5 EphemeralStorage
kubectl get pods --all-namespaces --field-selector=status.phase=Failed -o wide
kubectl describe pod <evicted-pod-name> -n <namespace> | grep -i message Step 2: Immediate Cleanup of Evicted Pods
Remove failed pods to free up their held ephemeral storage resources.
kubectl get pods --all-namespaces --field-selector=status.phase=Failed -o jsonpath='{.items[*].metadata.name}' | xargs -n1 kubectl delete pod --now
# Or for a specific namespace:
kubectl delete pods --field-selector=status.phase=Failed -n <namespace> Step 3: Free Disk Space on the Affected Node
SSH into the node and clean up common ephemeral storage consumers: container logs, unused images, and kubelet cache.
# Check disk usage
df -h /var/lib/kubelet
# Clean container logs (adjust path for your CRI)
sudo find /var/log/pods -name "*.log" -type f -delete
# Clean unused docker images
sudo docker image prune -a -f
# Clean kubelet cache (if using containerd)
sudo crictl rmi --prune Step 4: Scale Down the Offending Workload
Reduce replica count to immediately lower pressure, allowing the node to recover.
kubectl scale deployment <deployment-name> -n <namespace> --replicas=<reduced-number> Step 5: Configure Pod Ephemeral Storage Limits and Requests
Add ephemeral-storage requests and limits to pod specs to give the scheduler better visibility and enforce boundaries.
# Example container spec addition:
resources:
requests:
ephemeral-storage: "1Gi"
limits:
ephemeral-storage: "2Gi" Step 6: Implement Log Rotation and Size Limits
Configure your container runtime (Docker/containerd) and application logging to prevent unbounded log growth.
# For Docker (in daemon.json)
{"log-driver": "json-file", "log-opts": {"max-size": "10m", "max-file": "3"}}
# For a Pod using emptyDir with SizeLimit
volumes:
- name: log-volume
emptyDir:
sizeLimit: 500Mi Step 7: Adjust Kubelet Eviction Thresholds
Increase the node's ephemeral storage eviction threshold to provide a larger buffer, but ensure adequate monitoring.
# Add to kubelet configuration (e.g., /var/lib/kubelet/config.yaml)
evictionHard:
ephemeral-storage.available: "5%"
# Then restart the kubelet
sudo systemctl restart kubelet Step 8: Monitor and Alert on Ephemeral Storage
Set up Prometheus/Grafana alerts for node ephemeral storage usage to catch issues before evictions.
# Example PromQL for alerting on high usage
100 - (kubelet_volume_stats_available_bytes{persistentvolumeclaim=""} / kubelet_volume_stats_capacity_bytes{persistentvolumeclaim=""} * 100) > 85 Architect's Pro Tip
"This often happens when applications write debug/trace logs to stdout/stderr without rotation during traffic spikes. The default container log driver stores these in /var/log/pods, consuming ephemeral storage. Implement application-level log throttling and use sidecar containers for log shipping instead of local storage."
Frequently Asked Questions
Will deleting evicted pods cause data loss?
Ephemeral storage (emptyDir) data is lost when a pod is deleted. For persistent data, use PersistentVolumeClaims (PVCs). Evicted pods are already terminated, so deleting them only removes their metadata from the API server.
How do I find which pod/container is using the most ephemeral storage?
SSH into the node and run `sudo du -sh /var/lib/kubelet/pods/*` to see per-pod usage. Drill down into `volumes/` and `containers/` subdirectories to identify the culprit.