CRITICAL

Redis Sentinel Failing Health Checks: Troubleshooting the CLUSTERDOWN Error

Quick Fix Summary

TL;DR

Check if a majority of Sentinel nodes are reachable and can communicate. Restart the Sentinel service on a quorum of nodes.

The CLUSTERDOWN error occurs when Redis Sentinel cannot achieve a quorum to perform failover operations, often due to network partitions, misconfiguration, or insufficient healthy Sentinel instances.

Diagnosis & Causes

  • Network connectivity issues between Sentinel nodes
  • Insufficient number of healthy Sentinel instances to form a quorum
  • Recovery Steps

    1

    Step 1: Verify Sentinel Cluster State and Quorum

    Check the status of all Sentinel instances to see which are reachable and confirm the current master.

    bash
    redis-cli -p 26379 sentinel masters
    redis-cli -p 26379 sentinel sentinels <master-name>
    redis-cli -p 26379 sentinel get-master-addr-by-name <master-name>
    2

    Step 2: Check Sentinel and Redis Logs for Errors

    Examine logs for connection failures, vote disagreements, or configuration errors.

    bash
    sudo journalctl -u redis-sentinel --since "1 hour ago"
    sudo tail -f /var/log/redis/sentinel.log
    sudo grep -E "(failover|vote|quorum|down)" /var/log/redis/sentinel.log
    3

    Step 3: Validate Network Connectivity Between Sentinels

    Ensure all Sentinel nodes can communicate on their configured ports (default 26379).

    bash
    for ip in $(redis-cli -p 26379 sentinel sentinels <master-name> | grep ip | awk -F: '{print $2}'); do nc -zv $ip 26379; done
    sudo ss -tlnp | grep 26379
    4

    Step 4: Confirm Sentinel Configuration and Quorum Settings

    Verify the `sentinel monitor` directive and `quorum` value are consistent across all Sentinel configs.

    bash
    sudo grep -E "^(sentinel monitor|sentinel down-after-milliseconds|quorum)" /etc/redis/sentinel.conf
    cat /etc/redis/sentinel.conf
    5

    Step 5: Force a Sentinel Failover if Quorum is Achievable

    If a quorum of Sentinels is reachable but the cluster is stuck, manually trigger a failover.

    bash
    redis-cli -p 26379 sentinel failover <master-name>
    6

    Step 6: Restart Sentinel Services to Clear State

    Gracefully restart Sentinel instances, starting with the one that can see the current master.

    bash
    sudo systemctl restart redis-sentinel
    sudo systemctl status redis-sentinel
    7

    Step 7: Check Underlying Redis Master/Slave Health

    Ensure the Redis instances being monitored are themselves healthy and replicating.

    bash
    redis-cli -h <master-ip> -p 6379 info replication
    redis-cli -h <slave-ip> -p 6379 info replication

    Architect's Pro Tip

    "A split-brain scenario where two subsets of Sentinels each elect a different master is a common root cause. Always verify the `master` field from `sentinel masters` on ALL Sentinel nodes to ensure consensus."

    Frequently Asked Questions

    How many Sentinel nodes do I need to avoid CLUSTERDOWN?

    You need a quorum, which is typically a majority. For 3 nodes, quorum is 2. For 5 nodes, quorum is 3. Always deploy an odd number (3, 5) to avoid ties.

    Can I temporarily fix this by restarting just one Sentinel?

    No. Restarting a single Sentinel often won't resolve a quorum issue. You must restore connectivity or restart enough Sentinels to re-establish a majority quorum.

    Related Redis Guides