ERROR

Kubernetes Troubleshooting Guide: Diagnosing ImagePullBackOff on Linux Nodes

Quick Fix Summary

TL;DR

Check pod events with `kubectl describe pod <pod-name>` and verify image name, registry credentials, and network connectivity.

ImagePullBackOff occurs when a Kubernetes node cannot pull a container image from a registry. This prevents pod initialization and requires investigation of authentication, network, and image configuration.

Diagnosis & Causes

  • Incorrect image name or tag in deployment manifest.
  • Missing or invalid imagePullSecrets for private registry.
  • Network firewall blocking access to container registry.
  • Registry authentication failure or expired credentials.
  • Node disk space exhaustion preventing image layer storage.
  • Recovery Steps

    1

    Step 1: Isolate the Faulty Pod and Inspect Events

    First, identify the affected pod and examine Kubernetes events for specific error messages.

    bash
    kubectl get pods --all-namespaces | grep -i imagepullbackoff
    kubectl describe pod <pod-name> -n <namespace>
    2

    Step 2: Verify Image Name and Tag Existence

    Manually test if the image reference is correct and accessible from the node's context.

    bash
    kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].image}'
    docker pull <image-name:tag>
    3

    Step 3: Diagnose Private Registry Authentication

    Check if imagePullSecrets are configured and correctly referenced in the pod spec or service account.

    bash
    kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A5 -B5 imagePullSecrets
    kubectl get secrets -n <namespace>
    4

    Step 4: Test Network Connectivity from the Node

    SSH into the affected worker node and test DNS resolution and TCP connectivity to the registry.

    bash
    ssh <node-ip>
    nslookup registry-1.docker.io
    telnet registry-1.docker.io 443
    curl -I https://registry-1.docker.io/v2/
    5

    Step 5: Inspect Container Runtime and Node Logs

    Examine the container runtime (Docker/containerd) logs on the node for low-level pull errors.

    bash
    journalctl -u docker --since "1 hour ago" | grep -i pull
    journalctl -u containerd --since "1 hour ago" | grep -i pull
    df -h /var/lib/docker
    6

    Step 6: Validate and Correct imagePullSecrets

    Create or update the Kubernetes secret for Docker registry credentials and link it to the pod's service account.

    bash
    kubectl create secret docker-registry regcred --docker-server=<registry-url> --docker-username=<user> --docker-password=<pass> --docker-email=<email> -n <namespace>
    kubectl patch serviceaccount default -n <namespace> -p '{"imagePullSecrets": [{"name": "regcred"}]}'

    Architect's Pro Tip

    "For air-gapped clusters, always check if your node's container runtime is configured with the correct `--insecure-registry` or registry mirrors in `/etc/docker/daemon.json`."

    Frequently Asked Questions

    What's the difference between ImagePullBackOff and ErrImagePull?

    ErrImagePull is the initial failure state. Kubernetes then enters a backoff retry loop, changing the status to ImagePullBackOff. Both indicate the same root cause.

    Can ImagePullBackOff be caused by resource constraints?

    Yes. If the node's disk (especially `/var/lib/docker` or `/var/lib/containerd`) is full, the image layers cannot be stored, causing the pull to fail.

    How do I troubleshoot ImagePullBackOff on an Amazon EKS cluster?

    For ECR, ensure the node's IAM role has the `AmazonEC2ContainerRegistryReadOnly` policy attached. Also, verify that the ECR repository exists in the correct region.

    Related Linux Guides