CRITICAL

How to Fix Kubernetes 0/3 Nodes Are Available (Scheduling Failed)

Quick Fix Summary

TL;DR

Check node taints, resource requests, and node conditions with `kubectl describe node` and `kubectl get events`.

The Kubernetes scheduler cannot find a suitable node to place your pod, resulting in a 'SchedulingFailed' event. This is a critical failure that prevents application deployment or scaling.

Diagnosis & Causes

  • Node taints preventing pod placement.
  • Pod resource requests exceed node capacity.
  • NodeSelector or node affinity rules not satisfied.
  • All nodes are in a NotReady state.
  • PersistentVolumeClaim binding failures.
  • Recovery Steps

    1

    Step 1: Diagnose with kubectl describe and get events

    Immediately gather the error details from the pod and cluster events to understand the scheduler's decision.

    bash
    kubectl describe pod <pod-name> -n <namespace>
    kubectl get events --sort-by='.lastTimestamp' -n <namespace>
    2

    Step 2: Inspect Node Status and Taints

    Check if nodes are ready and examine their taints, which can repel pods unless the pod has a matching toleration.

    bash
    kubectl get nodes
    kubectl describe node <node-name> | grep -A 10 -i taint
    3

    Step 3: Verify Node Resources vs. Pod Requests

    Compare the pod's resource requests (CPU/memory) against the allocatable resources on each node.

    bash
    kubectl describe node <node-name> | grep -A 5 -i allocatable
    kubectl describe pod <pod-name> | grep -i requests -A 2
    4

    Step 4: Check for NodeSelector and Affinity Rules

    Ensure your pod's nodeSelector or nodeAffinity rules match labels present on at least one ready node.

    bash
    kubectl get nodes --show-labels
    kubectl describe pod <pod-name> | grep -i node-selector -A 2 -B 2
    5

    Step 5: Resolve PersistentVolumeClaim Issues

    If the pod requires a PersistentVolume, ensure a matching StorageClass exists and the PVC is bound.

    bash
    kubectl get pvc -n <namespace>
    kubectl describe pvc <pvc-name> -n <namespace>
    6

    Step 6: Add Tolerations or Fix Node State

    Based on diagnosis, either add tolerations to your pod spec or cordon/drain/fix the problematic node.

    yaml
    # Example Toleration for a common node taint
    tolerations:
    - key: "node.kubernetes.io/unreachable"
      operator: "Exists"
      effect: "NoSchedule"

    Architect's Pro Tip

    "Use `kubectl get pods -o wide --all-namespaces | grep -v Running` to instantly see all non-running pods and their assigned nodes, highlighting cluster-wide scheduling issues."

    Frequently Asked Questions

    What's the difference between a taint and a nodeSelector?

    A taint repels pods from a node unless they tolerate it (node-pushes-pod-away). A nodeSelector attracts a pod to nodes with matching labels (pod-pulls-to-node).

    My node shows 'Ready' but pods still won't schedule. Why?

    The node may be cordoned (scheduling disabled), have resource pressure (memory/disk), or have unschedulable taints like 'NoSchedule' that your pod doesn't tolerate.

    How can I prevent this error during deployment?

    Define realistic resource requests/limits in your pod spec, use PodDisruptionBudgets for voluntary disruptions, and ensure node affinity/toleration rules are tested in non-production first.

    Related Kubernetes Guides