How to Fix K8s CrashLoopBackOff: Probe Failures & Resources
Quick Fix Summary
TL;DRCheck pod logs, verify liveness/readiness probe endpoints, and increase resource limits to resolve immediate container crashes.
CrashLoopBackOff indicates a pod's container is repeatedly crashing and Kubernetes is backing off restart attempts. This is most commonly caused by failed health probes or insufficient compute resources.
Diagnosis & Causes
Recovery Steps
Step 1: Diagnose with kubectl logs and describe
First, gather immediate diagnostic data to see *why* the container is exiting. Check the logs of the most recent crash and inspect the pod's events.
# Get logs from the last container instance
kubectl logs <pod-name> --previous
# Get detailed pod status and recent events
kubectl describe pod <pod-name> Step 2: Fix Health Probe Failures
If `describe` shows probe failures, verify your probe configuration matches the application's reality. Adjust timeouts, delays, or endpoints.
# Example: Adjust a liveness probe for a slow-starting app
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 45 # Increase if app boots slowly
periodSeconds: 10
failureThreshold: 3 Step 3: Resolve Resource Constraints (OOMKilled)
If logs show 'OOMKilled', the container exceeded its memory limit. You must increase the memory limit or optimize the application.
# Example: Increase memory limits and requests in pod spec
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi" # Increase this value
cpu: "500m" Step 4: Debug with an Ephemeral Container or Interactive Shell
For complex issues, run a debug container in the pod's namespace to inspect the filesystem, network, or run commands.
# Run a busybox debug container in the problematic pod's namespace
kubectl debug -it <pod-name> --image=busybox:latest --target=<container-name>
# Once inside the debug shell, you can inspect, e.g.,
# wget -O- http://localhost:<port>/health
# cat /etc/config/application.properties Architect's Pro Tip
"Use `kubectl get events --sort-by='.lastTimestamp' -A` to see cluster-wide events. A 'FailedScheduling' event due to insufficient CPU/Memory on nodes often precedes CrashLoopBackOff."
Frequently Asked Questions
What's the difference between CrashLoopBackOff and ImagePullBackOff?
CrashLoopBackOff means the container image was pulled successfully but the application inside it keeps crashing. ImagePullBackOff means Kubernetes cannot even pull the container image from the registry.
How long does Kubernetes wait between restart attempts in a CrashLoopBackOff?
The backoff delay increases exponentially (10s, 20s, 40s...) up to a cap of 5 minutes. This is to prevent a crashing pod from consuming excessive resources.
Can a misconfigured readiness probe cause CrashLoopBackOff?
No. A failed readiness probe will not restart the container. It only removes the pod from Service endpoints. Only a failed *liveness* probe will cause a container restart, potentially leading to CrashLoopBackOff.