CRITICAL

Root Cause Analysis: Why Docker 'No Space Left on Device' Happens with ZFS

Quick Fix Summary

TL;DR

Run 'docker system prune -a --volumes' and check ZFS dataset quotas with 'zfs list'.

Docker's ZFS storage driver creates datasets with fixed quotas. When containers or images exceed these quotas, writes fail despite available pool space. This is a ZFS architectural constraint, not a simple disk full error.

Diagnosis & Causes

  • ZFS dataset quota exhausted for Docker's root dataset.
  • Uncleaned stopped containers and unused images consuming quota.
  • Dangling volumes not pruned, occupying dataset space.
  • ZFS snapshot retention consuming reserved dataset space.
  • Thin pool over-provisioning without monitoring actual usage.
  • Recovery Steps

    1

    Step 1: Immediate Cleanup of Docker Objects

    Remove all unused containers, images, networks, and volumes to free up space within the constrained dataset.

    bash
    docker system prune -a --volumes --force
    2

    Step 2: Inspect ZFS Dataset Structure and Quotas

    Identify the specific Docker-managed ZFS datasets and their current usage versus quota limits.

    bash
    zfs list -t filesystem -o name,used,avail,refer,quota,mountpoint | grep -i docker
    zfs get quota,refquota $(docker info -f '{{.DriverStatus}}' | grep 'Pool Name' | awk '{print $3}')/docker
    3

    Step 3: Increase Quota on the Constrained Dataset

    If pool space allows, increase the quota on Docker's root ZFS dataset to accommodate growth.

    bash
    DOCKER_POOL=$(docker info -f '{{.DriverStatus}}' | grep 'Pool Name' | awk '{print $3}')
    sudo zfs set quota=100G $DOCKER_POOL/docker
    4

    Step 4: Prune ZFS Snapshots Associated with Docker

    Remove old ZFS snapshots that may be consuming space counted against the dataset's quota.

    bash
    sudo zfs list -t snapshot -o name,used | grep $(docker info -f '{{.DriverStatus}}' | grep 'Pool Name' | awk '{print $3}') | sort
    # Review list, then delete: sudo zfs destroy <snapshot_name>
    5

    Step 5: Implement Proactive Monitoring and Alerts

    Set up monitoring for ZFS dataset usage to prevent future outages. Use a script or tool like Prometheus with node_exporter.

    bash
    #!/bin/bash
    THRESHOLD=90
    USAGE_PCT=$(zfs list -H -o used,quota $DOCKER_DATASET | awk '{if ($2 != "none") {used=$1; quota=$2; print (used/quota)*100} else {print 0}}')
    if (( $(echo "$USAGE_PCT > $THRESHOLD" | bc -l) )); then echo "CRITICAL: Docker ZFS dataset quota at ${USAGE_PCT}%"; fi
    6

    Step 6: (Architectural) Consider Separate Datasets for Volumes

    For high-volume applications, store persistent data on a separate ZFS dataset with its own quota, isolating it from the image/container dataset.

    bash
    sudo zfs create -o quota=500G $DOCKER_POOL/docker-volumes
    # Then bind-mount this dataset into containers for persistent storage.

    Architect's Pro Tip

    "The 'zfs list' command shows 'AVAIL' relative to the dataset quota, not the pool's free space. Always monitor 'USED' vs 'QUOTA', not just pool capacity."

    Frequently Asked Questions

    My ZFS pool has plenty of free space, so why is Docker reporting 'no space left'?

    Docker's ZFS driver operates within a child dataset that has a strict quota. You've hit that quota limit, not the pool limit. Use 'zfs list' to see the quota on your docker/* datasets.

    Is it safe to run 'docker system prune --volumes'?

    It will delete all volumes not actively used by at least one container, which can cause data loss. Always ensure critical data is backed up or stored on named, actively used volumes before pruning.

    Should I switch from the ZFS storage driver to overlay2 to avoid this?

    Overlay2 has different trade-offs (performance, snapshotting). If you rely on ZFS features like compression, deduplication, or native snapshots, fixing the quota management is better than switching drivers.

    Related Docker Guides