Kubernetes Troubleshooting Guide: Diagnosing Pod FailedScheduling Errors
Quick Fix Summary
TL;DRCheck node resource availability and taint/toleration mismatches using `kubectl describe pod` and `kubectl get nodes`.
FailedScheduling occurs when the Kubernetes scheduler cannot find a suitable node to place a Pod. This is a pre-runtime error that prevents the Pod from starting.
Diagnosis & Causes
Recovery Steps
Step 1: Inspect the Pod Event Log
The `kubectl describe pod` command reveals the scheduler's specific reason for failure in the Events section.
kubectl describe pod <pod-name> -n <namespace>
# Look for lines like:
# Events:
# Type Reason Age From Message
# Warning FailedScheduling 10s default-scheduler 0/3 nodes are available: 1 Insufficient cpu, 2 node(s) didn't match Pod's node affinity. Step 2: Check Node Resource Availability
Compare the Pod's resource requests against the allocatable resources of your cluster nodes.
kubectl get nodes
kubectl describe node <node-name>
# In the output, check:
# Allocatable:
# cpu: 940m
# memory: 5442344Ki
# Compare this to your Pod's `spec.containers[].resources.requests`. Step 3: Verify Node Selectors, Affinity, and Taints
Ensure your Pod's placement constraints (affinity/selectors) are compatible with node labels and taints.
# Check Pod's placement rules
kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A 10 -B 5 'nodeSelector\|affinity\|tolerations'
# Check a Node's labels and taints
kubectl describe node <node-name> | grep -A 10 -B 5 'Labels\|Taints:' Step 4: Check Node Status and Conditions
A node must be in a 'Ready' state to be schedulable. DiskPressure, MemoryPressure, or NetworkUnready can prevent scheduling.
kubectl get nodes
kubectl describe node <node-name> | grep -A 10 'Conditions:'
# Look for:
# Conditions:
# Type Status LastHeartbeatTime
# Ready True ... (GOOD)
# MemoryPressure False ... (GOOD)
# DiskPressure False ... (GOOD) Step 5: Diagnose with Scheduler Logs (Advanced)
For complex issues, increase the scheduler's verbosity to see its internal decision-making process.
# Edit the kube-scheduler deployment to add verbose logging
kubectl edit deploy kube-scheduler -n kube-system
# In the container command args, add:
# - --v=4
# Then view the logs:
kubectl logs -f deployment/kube-scheduler -n kube-system | grep -i <pod-name> Step 6: Simulate Scheduling with `kubectl describe`
Use the `kubectl describe` output to manually verify if any node meets the Pod's requirements.
# From the 'FailedScheduling' event message, note the reasons (e.g., 'Insufficient memory', 'node(s) didn't match node selector').
# Cross-reference:
# 1. For 'Insufficient memory': Run `kubectl top nodes`.
# 2. For selector/affinity: Run `kubectl get nodes --show-labels`.
# 3. For taints: Run `kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints`. Architect's Pro Tip
"Use `kubectl get pods --field-selector=status.phase=Pending -A` to quickly find all unscheduled Pods across namespaces before diving into individual descriptions."
Frequently Asked Questions
What's the difference between FailedScheduling and ImagePullBackOff?
FailedScheduling happens BEFORE the Pod is assigned to a node (scheduling phase). ImagePullBackOff happens AFTER scheduling, when the node cannot pull the container image (runtime phase).
Can a Pod be stuck in Pending for reasons other than FailedScheduling?
Yes. A Pending Pod might be waiting for a PersistentVolumeClaim to be bound ('Waiting for volume to bind') or for a ClusterResource like a GPU device driver to be available, which are separate from scheduler failures.
How do I fix '0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready}'?
This taint is automatically added by K8s when a node is unhealthy. Fix the underlying node issue (kubelet, network). As a temporary workaround, you can add a toleration for this taint to your Pod spec, but this is not recommended for production.