CRITICAL

How to Fix Kubernetes 502 Bad Gateway in Ingress (K8s 1.31+)

Quick Fix Summary

TL;DR

Check Ingress backend service endpoints and verify pod readiness probes are passing.

A 502 Bad Gateway from Kubernetes Ingress indicates the Ingress controller (e.g., NGINX) cannot establish a connection to the backend Pods defined in your Service. This is a critical networking failure between the Ingress layer and your application.

Diagnosis & Causes

  • Backend Service has no healthy endpoints.
  • Pod readiness or liveness probes are failing.
  • NetworkPolicy blocking Ingress controller traffic.
  • Misconfigured Ingress `servicePort` or `serviceName`.
  • Resource constraints causing Pod crashes or throttling.
  • Recovery Steps

    1

    Step 1: Verify Service Endpoints and Pod Status

    First, confirm your Service is correctly targeting running Pods and that the Pods are ready.

    bash
    # Check if your Service has Endpoints
    kubectl get endpoints <your-service-name> -n <namespace>
    # Describe the Service to see selector and port mapping
    kubectl describe svc <your-service-name> -n <namespace>
    # Check Pod status and readiness
    kubectl get pods -n <namespace> -l app=<your-app-label> -o wide
    2

    Step 2: Inspect Pod Readiness/Liveness Probes

    A failing readiness probe removes a Pod from Service endpoints. Check probe configuration and logs.

    bash
    # Get the Pod's YAML to review probe configuration
    kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A 15 readinessProbe
    # Check for probe-related errors in Pod events
    kubectl describe pod <pod-name> -n <namespace> | tail -30
    # Check application logs for probe requests
    kubectl logs <pod-name> -n <namespace> --tail=50
    3

    Step 3: Check Ingress Controller Logs & Configuration

    Examine the Ingress controller logs for upstream connection errors and verify its configuration.

    bash
    # Get the Ingress controller Pod name (adjust label selector for your setup)
    kubectl get pods -n ingress-nginx --show-labels
    # Tail the error logs for 502s
    kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller --tail=100 | grep -i "502\|upstream"
    # Check the generated NGINX config for your upstream
    kubectl exec -n ingress-nginx <controller-pod> -- cat /etc/nginx/nginx.conf | grep -A 10 -B 5 "<your-service-name>"
    4

    Step 4: Validate Network Connectivity

    Test connectivity from the Ingress controller namespace to your application Pods to rule out NetworkPolicy issues.

    bash
    # Run a temporary curl Pod in the Ingress controller namespace
    kubectl run curl-test --image=curlimages/curl:latest -n ingress-nginx --rm -it --restart=Never -- sh
    # Inside the test pod, curl your Service's ClusterIP and Port
    curl -v http://<service-cluster-ip>:<port>
    # Also test direct Pod IP (bypass Service)
    curl -v http://<pod-ip>:<container-port>
    5

    Step 5: Review Ingress Resource Definition

    Ensure the Ingress spec correctly references the Service name and port.

    yaml
    # Get your Ingress resource YAML
    kubectl get ingress <ingress-name> -n <namespace> -o yaml
    # Pay close attention to the `backend.service` block:
    # spec:
    #   rules:
    #   - host: ...
    #     http:
    #       paths:
    #       - path: /
    #         pathType: Prefix
    #         backend:
    #           service:
    #             name: your-correct-service-name  # <-- Must match
    #             port:
    #               number: 8080                    # <-- Must be a port defined in the Service
    6

    Step 6: Check for Resource Limits & Pod Eviction

    Insufficient CPU/Memory can cause Pod crashes or throttling, leading to intermittent 502s.

    bash
    # Check for recent Pod evictions or OOMKilled events
    kubectl get pods -n <namespace> -o wide | grep -E "Evicted|CrashLoopBackOff"
    # Describe a problematic Pod for resource events
    kubectl describe pod <pod-name> -n <namespace> | grep -A 5 -B 5 "Events:"
    # Check node resource pressure
    kubectl describe nodes | grep -A 5 "Allocatable"

    Architect's Pro Tip

    "For intermittent 502s under load, increase `proxy-next-upstream-tries` and `proxy-connect-timeout` in your Ingress Controller ConfigMap to handle upstream flakiness."

    Frequently Asked Questions

    My Service has Endpoints, but I still get a 502. What's next?

    The Pods are 'Ready' but may not be accepting traffic. Check application startup time vs. initialDelaySeconds in your readiness probe. Also, run a connectivity test from the Ingress controller namespace directly to a Pod IP to bypass potential kube-proxy or Service issues.

    Does this guide apply to AWS ALB Ingress Controller or other Ingress controllers?

    The core principles (Service/Endpoint health, probes, networking) are universal. However, diagnostic commands and specific configurations (like the Pro Tip parameters) differ. Always consult your specific Ingress controller's documentation and logs.

    Why did this start happening after upgrading to K8s 1.31+?

    Kubernetes 1.31 may include updates to the `EndpointSlice` API, kube-proxy, or core networking that could affect Service discovery. Ensure your Ingress controller version is compatible with 1.31. Also, review any deprecated API removals that might affect your Ingress or Service definitions.

    Related Kubernetes Guides