How to Fix GCP Cloud-Run-429
Quick Fix Summary
TL;DRIncrease your service's maximum concurrency and request timeout limits in Cloud Run configuration.
A Cloud-Run-429 error indicates your service is receiving more concurrent requests than it's configured to handle. This is a server-side throttling mechanism, not a client-side issue.
Diagnosis & Causes
Recovery Steps
Step 1: Immediate Capacity Increase via Console
Quickly raise the concurrency and timeout limits to stop the 429 errors and allow traffic to flow.
# Navigate to Cloud Run > Your Service > Edit & Deploy New Revision
# Under 'Capacity':
# 1. Set 'Maximum number of requests per container' to 80 (or higher)
# 2. Set 'Request timeout' to 300s (5 minutes) if needed
# 3. Under 'Autoscaling', increase 'Maximum number of instances'
# 4. Click 'Deploy' Step 2: Configure via gcloud CLI for CI/CD
Programmatically update the service configuration using the gcloud command-line tool. This is ideal for reproducible deployments.
gcloud run services update YOUR_SERVICE_NAME \
--region=YOUR_REGION \
--concurrency=80 \
--max-instances=50 \
--timeout=300s \
--cpu=2 \
--memory=2Gi Step 3: Implement Application-Level Rate Limiting & Queuing
For more control, implement a request queue and worker pattern within your service to smooth out traffic bursts.
// Node.js example using express and bull queue
const Queue = require('bull');
const processQueue = new Queue('requests');
processQueue.process(async (job) => {
// Your request logic here
});
app.post('/endpoint', async (req, res) => {
const job = await processQueue.add(req.body);
res.json({ jobId: job.id });
}); Step 4: Proactive Monitoring & Alerting
Set up alerts for concurrency and request count to catch issues before they cause 429 errors.
# Create an alerting policy for high concurrency
gcloud alpha monitoring policies create \
--policy-from-file="alert_policy.json"
# Sample alert_policy.json condition:
# metric: run.googleapis.com/container/concurrent_requests
# condition: above 70% of max_configured_concurrency for 1 minute Architect's Pro Tip
"Set your 'max concurrency' to ~80% of your container's proven stable limit. This leaves headroom for health checks and garbage collection, preventing cascading failures during traffic spikes."
Frequently Asked Questions
Is a 429 error from Cloud Run a problem with my code?
Not necessarily. It's primarily an infrastructure throttling error. However, inefficient code that processes requests slowly can trigger it by exhausting concurrency slots.
What's the difference between Cloud Run 429 and a 5xx error?
A 429 means the service is too busy to accept more work (scaling/limit issue). A 5xx error (like 502 or 503) typically indicates the container crashed or failed to start.
Will increasing concurrency increase my costs?
Potentially, but efficiently. Higher concurrency allows each container instance to handle more requests, which can reduce the number of needed instances. Monitor your bill and CPU utilization.