Root Cause Analysis: Why Alibaba Cloud ECS Disk Full Errors Happen
Quick Fix Summary
TL;DRRun 'sudo du -sh /* 2>/dev/null | sort -rh | head -20' to identify largest directories, then clean or expand storage.
Alibaba Cloud ECS DiskFull errors occur when the filesystem reaches 100% capacity, blocking write operations and potentially crashing applications. This is a critical infrastructure failure requiring immediate investigation of storage consumption patterns.
Diagnosis & Causes
Recovery Steps
Step 1: Immediate Disk Space Analysis
Identify which directories and files are consuming the most space using standard Linux utilities.
df -h
sudo du -sh /* 2>/dev/null | sort -rh | head -20
sudo lsof +L1 Step 2: Clean Common Temporary and Log Files
Safely remove temporary files, old logs, and package cache without breaking system functionality.
sudo journalctl --vacuum-time=3d
sudo rm -rf /tmp/*
sudo apt-get clean || sudo yum clean all
sudo find /var/log -name "*.log" -mtime +7 -delete Step 3: Investigate Application-Specific Storage
Check database logs, container storage, and application caches that often grow unexpectedly.
sudo du -sh /var/lib/docker/* 2>/dev/null
sudo du -sh /var/lib/mysql/* 2>/dev/null
sudo find /home -name "core" -type f -delete Step 4: Configure Alibaba Cloud Monitoring and Auto-Scaling
Set up CloudMonitor alerts and auto-scaling policies to prevent future disk full scenarios.
# Create CloudMonitor rule for disk usage
aliyun cms PutGroupMetricRule \
--RuleName disk_usage_alert \
--Namespace acs_ecs_dashboard \
--MetricName disk_utilization \
--Dimensions '[{"instanceId":"YOUR_INSTANCE_ID"}]' \
--Statistics Average \
--ComparisonOperator >= \
--Threshold 85 \
--Period 60 \
--EvaluationCount 2 Step 5: Implement Preventive Log Rotation
Configure logrotate to automatically manage log file growth for system and application logs.
sudo nano /etc/logrotate.d/myapp
/var/log/myapp/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 644 root root
} Step 6: Expand Disk Capacity (If Needed)
Resize the system disk or add a data disk through Alibaba Cloud Console or API.
# Resize disk via Alibaba Cloud CLI
aliyun ecs ResizeDisk \
--DiskId d-1234567890 \
--NewSize 100
# After resizing in console, extend filesystem
sudo growpart /dev/vda 1
sudo resize2fs /dev/vda1 Architect's Pro Tip
"Check for deleted files still held open by processes using 'lsof +L1'. A service restart may immediately free significant space without file deletion."
Frequently Asked Questions
Why does 'df' show 100% usage but 'du' shows less total space used?
This indicates deleted files are still held open by running processes. Use 'sudo lsof +L1' to identify these processes and restart them to reclaim space.
How can I prevent DiskFull errors in Kubernetes pods on ECS?
Configure pod resource limits with ephemeral storage requests/limits, and set up EmptyDir size limits or use Alibaba Cloud NAS for persistent storage.
What's the safest way to clean disk space without breaking production systems?
Always analyze with 'du' first, target application logs and temp directories, avoid removing system libraries, and test commands in staging before production.