How to Fix AWS General SDK ThrottlingException
Quick Fix Summary
TL;DRImplement exponential backoff with jitter in your SDK client configuration.
AWS services enforce API request rate limits per account, region, and IAM principal. A ThrottlingException (HTTP 429) occurs when these limits are exceeded, requiring immediate client-side retry logic.
Diagnosis & Causes
Recovery Steps
Step 1: Configure SDK Retry Logic with Exponential Backoff
The most critical fix is to configure your AWS SDK client to automatically retry throttled requests using an exponential backoff strategy with jitter to prevent synchronized retry storms.
import boto3
from botocore.config import Config
config = Config(
retries = {
'max_attempts': 10,
'mode': 'adaptive'
}
)
client = boto3.client('dynamodb', config=config) Step 2: Implement Application-Level Throttling (Token Bucket)
For high-throughput applications, proactively throttle your own request rate to stay below AWS limits, using a token bucket or leaky bucket algorithm.
import asyncio
import time
class TokenBucket:
def __init__(self, rate, capacity):
self.rate = rate # tokens per second
self.capacity = capacity
self.tokens = capacity
self.last_update = time.time()
async def consume(self, tokens=1):
while True:
now = time.time()
self.tokens = min(self.capacity, self.tokens + (now - self.last_update) * self.rate)
self.last_update = now
if self.tokens >= tokens:
self.tokens -= tokens
return
await asyncio.sleep((tokens - self.tokens) / self.rate) Step 3: Check and Request Service Quota Increases
Identify the specific service and API operation being throttled via CloudWatch Metrics or the error message, then request a quota increase in the AWS Service Quotas console.
# 1. Identify the throttled API from CloudTrail or error message.
# 2. Navigate to AWS Service Quotas Console.
# 3. Search for the service (e.g., 'Amazon DynamoDB').
# 4. Find the specific quota (e.g., 'Read capacity units').
# 5. Request an increase with a justified business case.
# CLI: Request a quota increase (example for Lambda Concurrent Executions)
aws service-quotas request-service-quota-increase --service-code lambda --quota-code L-B99A9384 --desired-value 1500 Step 4: Distribute Load Across Partitions (Shard Keys/Accounts)
For services like DynamoDB or Kinesis, distribute requests across multiple partition keys, streams, or even AWS accounts to avoid hitting per-partition or per-account limits.
# Example: Using a random partition key suffix for DynamoDB to avoid hot partitions
import random
import string
base_partition_key = "USER#12345"
suffix = ''.join(random.choices(string.ascii_lowercase + string.digits, k=4))
distributed_partition_key = f"{base_partition_key}#{suffix}" Architect's Pro Tip
"Enable 'Adaptive' retry mode in the SDK. It uses client-side metrics to dynamically adjust retry speed, which is more effective than standard exponential backoff during sustained throttling."
Frequently Asked Questions
What's the difference between a ThrottlingException and a ProvisionedThroughputExceededException?
ThrottlingException is a general API rate limit across AWS services. ProvisionedThroughputExceededException is specific to DynamoDB when exceeding table-level read/write capacity units.
Will increasing service quotas stop all throttling?
No. Some limits are hard (e.g., S3 5,500 GET/sec per prefix). For these, you must design your application to distribute requests (e.g., using randomized prefixes).
How do I find which service is throttling me?
Check the `service` and `operation` fields in the ThrottlingException error message or look for `ThrottledRequests` metrics in the suspected service's CloudWatch namespace.