ERROR

Kubernetes Troubleshooting Guide: Diagnosing Pod FailedScheduling Errors

Quick Fix Summary

TL;DR

Check node resource availability and taint/toleration mismatches using `kubectl describe pod` and `kubectl get nodes`.

FailedScheduling occurs when the Kubernetes scheduler cannot find a suitable node to place a Pod. This is a pre-runtime error that prevents the Pod from starting.

Diagnosis & Causes

Insufficient CPU or Memory resources on nodes.

NodeSelector or NodeAffinity rules not matching any node.

Taint on nodes without corresponding Pod toleration.

No nodes are in a Ready state to accept workloads.

Resource requests exceed available node capacity.

Recovery Steps

Step 1: Inspect the Pod Event Log

The `kubectl describe pod` command reveals the scheduler's specific reason for failure in the Events section.

bash

kubectl describe pod <pod-name> -n <namespace>
# Look for lines like:
# Events:
#   Type     Reason            Age   From               Message
#   Warning  FailedScheduling  10s   default-scheduler  0/3 nodes are available: 1 Insufficient cpu, 2 node(s) didn't match Pod's node affinity.

Step 2: Check Node Resource Availability

Compare the Pod's resource requests against the allocatable resources of your cluster nodes.

bash

kubectl get nodes
kubectl describe node <node-name>
# In the output, check:
# Allocatable:
#   cpu:                940m
#   memory:             5442344Ki
# Compare this to your Pod's `spec.containers[].resources.requests`.

Step 3: Verify Node Selectors, Affinity, and Taints

Ensure your Pod's placement constraints (affinity/selectors) are compatible with node labels and taints.

bash

# Check Pod's placement rules
kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A 10 -B 5 'nodeSelector\|affinity\|tolerations'
# Check a Node's labels and taints
kubectl describe node <node-name> | grep -A 10 -B 5 'Labels\|Taints:'

Step 4: Check Node Status and Conditions

A node must be in a 'Ready' state to be schedulable. DiskPressure, MemoryPressure, or NetworkUnready can prevent scheduling.

bash

kubectl get nodes
kubectl describe node <node-name> | grep -A 10 'Conditions:'
# Look for:
# Conditions:
#   Type             Status  LastHeartbeatTime
#   Ready            True    ... (GOOD)
#   MemoryPressure   False   ... (GOOD)
#   DiskPressure     False   ... (GOOD)

Step 5: Diagnose with Scheduler Logs (Advanced)

For complex issues, increase the scheduler's verbosity to see its internal decision-making process.

bash

# Edit the kube-scheduler deployment to add verbose logging
kubectl edit deploy kube-scheduler -n kube-system
# In the container command args, add:
# - --v=4
# Then view the logs:
kubectl logs -f deployment/kube-scheduler -n kube-system | grep -i <pod-name>

Step 6: Simulate Scheduling with `kubectl describe`

Use the `kubectl describe` output to manually verify if any node meets the Pod's requirements.

bash

# From the 'FailedScheduling' event message, note the reasons (e.g., 'Insufficient memory', 'node(s) didn't match node selector').
# Cross-reference:
# 1. For 'Insufficient memory': Run `kubectl top nodes`.
# 2. For selector/affinity: Run `kubectl get nodes --show-labels`.
# 3. For taints: Run `kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints`.

Architect's Pro Tip

"Use `kubectl get pods --field-selector=status.phase=Pending -A` to quickly find all unscheduled Pods across namespaces before diving into individual descriptions."

Frequently Asked Questions

What's the difference between FailedScheduling and ImagePullBackOff?

FailedScheduling happens BEFORE the Pod is assigned to a node (scheduling phase). ImagePullBackOff happens AFTER scheduling, when the node cannot pull the container image (runtime phase).

Can a Pod be stuck in Pending for reasons other than FailedScheduling?

Yes. A Pending Pod might be waiting for a PersistentVolumeClaim to be bound ('Waiting for volume to bind') or for a ClusterResource like a GPU device driver to be available, which are separate from scheduler failures.

How do I fix '0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready}'?

This taint is automatically added by K8s when a node is unhealthy. Fix the underlying node issue (kubelet, network). As a temporary workaround, you can add a toleration for this taint to your Pod spec, but this is not recommended for production.

Related Kubernetes Guides

502 Bad Gateway

Kubernetes Troubleshooting Guide: Diagnosing Pod FailedScheduling Errors

Quick Fix Summary

Diagnosis & Causes

Recovery Steps

Step 1: Inspect the Pod Event Log

Step 2: Check Node Resource Availability

Step 3: Verify Node Selectors, Affinity, and Taints

Step 4: Check Node Status and Conditions

Step 5: Diagnose with Scheduler Logs (Advanced)

Step 6: Simulate Scheduling with `kubectl describe`

Architect's Pro Tip

Frequently Asked Questions

What's the difference between FailedScheduling and ImagePullBackOff?

Can a Pod be stuck in Pending for reasons other than FailedScheduling?

How do I fix '0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready}'?

Related Kubernetes Guides

How to Fix Kubernetes 502 Bad Gateway from Ingress-NGINX (K8s 1.30+)

How to Fix Kubernetes 502 Bad Gateway with Istio Service Mesh (2025)

How to Fix Kubernetes 502 Bad Gateway in Ingress (K8s 1.31+)