🪣 Token Bucket: Most Popular
Token bucket allows bursts, smooths to average rate. Most widely used algorithm.
Without rate limiting:
With rate limiting:
Rate limiting is used everywhere in production systems. Here are examples from popular services:
Rate Limits:
Implementation: GitHub uses a token bucket algorithm with different limits for authenticated vs unauthenticated users. When you exceed the limit, you receive a 403 Forbidden response with headers indicating when the limit resets:
HTTP/1.1 403 ForbiddenX-RateLimit-Limit: 5000X-RateLimit-Remaining: 0X-RateLimit-Reset: 1609459200X-RateLimit-Used: 5000Why these limits? GitHub needs to protect their infrastructure while allowing legitimate developers to build applications. The higher limit for authenticated users encourages API key usage, which helps GitHub identify and manage traffic better.
Rate Limits (v2 API):
Implementation: Twitter uses a sliding window algorithm. Different endpoints have different limits based on resource cost. More expensive operations (like search) have lower limits.
Real scenario: A social media analytics tool needs to fetch tweets for 1,000 users. With a 300 requests/15 minutes limit, it would take at least 50 minutes to complete, requiring careful request scheduling and rate limit tracking.
Rate Limits:
Implementation: AWS uses a token bucket with burst capacity. The burst allows short spikes above the steady-state rate, which is perfect for handling traffic patterns with occasional peaks.
Use case: An e-commerce site during Black Friday. Normal traffic is 1,000 requests/second, but during flash sales, traffic spikes to 5,000 requests/second for 30 seconds. The burst capacity handles these spikes without rejecting requests.
Rate Limits:
Implementation: Stripe uses a combination of token bucket and sliding window. They also implement idempotency keys to prevent duplicate charges, which have separate rate limits.
Critical use case: Payment processing. Stripe must prevent both abuse and accidental duplicate charges. Rate limiting protects their infrastructure while idempotency keys protect customers from double-charging.
Rate Limits:
Implementation: Google uses both per-second rate limits and daily quotas. This dual approach prevents both short-term abuse and long-term overuse.
Example: A delivery app needs to geocode addresses. With 40 requests/second, it can process 2,400 addresses per minute. For a delivery service handling 10,000 orders/day, this requires careful batching and caching of geocoded addresses.
Rate Limits:
Implementation: Reddit uses a fixed window algorithm with per-user tracking. This prevents individual users from overwhelming the API while allowing fair distribution across all users.
Real-world impact: A Reddit bot that posts comments needs to respect the 60 requests/minute limit. Posting too quickly results in temporary bans, requiring exponential backoff and retry logic.
Rate Limiting Rules:
Implementation: Cloudflare uses distributed rate limiting across their global network. Rules are evaluated at edge locations, providing protection before traffic reaches origin servers.
DDoS protection: During a DDoS attack, Cloudflare’s rate limiting automatically blocks excessive requests from individual IPs while allowing legitimate traffic through. This protects origin servers from being overwhelmed.
Rate Limits (before shutdown):
Why it mattered: Netflix’s API was used by third-party applications to access movie metadata. Rate limiting prevented abuse while allowing legitimate developers to build applications. The API was eventually shut down in favor of direct partnerships, but rate limiting was crucial during its operation.
Bucket holds tokens. Tokens refill at fixed rate. Request consumes token.
Characteristics:
Algorithm:
capacity tokensrefill_rate per secondBucket holds requests. Requests leak out at fixed rate. If full, reject.
Characteristics:
Algorithm:
capacity (queue size)leak_rate per secondTracks requests in sliding time window. More accurate than fixed window.
Characteristics:
Divides time into fixed windows. Simple but allows bursts.
Characteristics:
For multiple servers, use Redis:
Rate limiting can be applied at different levels depending on your use case. Here are common strategies with real-world examples:
Use case: Public APIs where you don’t have user authentication, or as a first line of defense.
Example: A public weather API limits each IP to 100 requests/hour. This prevents a single user from scraping all weather data while allowing legitimate usage.
rate_limiter = TokenBucket(capacity=100, refill_rate=10)client_ip = request.remote_addr
if not rate_limiter.is_allowed(client_ip): return "Rate limit exceeded", 429Use case: Authenticated APIs where you want to limit per-user usage, regardless of which device or IP they use.
Example: A social media API limits each authenticated user to 1,000 posts/day. This prevents spam while allowing legitimate users to post from multiple devices (phone, tablet, desktop).
Real-world scenario: A user tries to post 1,500 times in one day. After 1,000 posts, all subsequent requests return 429 until the next day. This protects the platform from spam while being fair to legitimate users.
Use case: Third-party integrations where each application gets its own API key with specific limits.
Example: A payment processing API provides each merchant with an API key. Free tier merchants get 1,000 transactions/month, while enterprise merchants get 100,000 transactions/month.
Real-world scenario: An e-commerce platform integrates with a payment API. They receive an API key with a 10,000 requests/day limit. During peak shopping season, they might hit this limit and need to upgrade their plan or implement request queuing.
Use case: SaaS products with multiple pricing tiers. Each tier gets different rate limits as part of the subscription.
Example: A cloud storage API offers three tiers:
Real-world scenario: A file backup application uses the API. Free users can sync files 100 times per hour, which is sufficient for personal use. Pro users (developers) get 1,000 calls/hour, enough for automated backups. Enterprise customers get 10,000 calls/hour for large-scale operations.
Business value: Tiered limits encourage upgrades. Users hitting free tier limits often upgrade to Pro, increasing revenue.
Use case: Different API endpoints have different costs. Expensive operations get lower limits.
Example: A machine learning API:
Real-world scenario: A video editing app uses the ML API. Users can classify images quickly (100/min), but video processing is limited to 10/min to prevent resource exhaustion. Model training requests are queued and processed one at a time.
# Different limits per endpointendpoint_limits = { '/api/classify-image': TokenBucket(100, 100/60), # 100/min '/api/process-video': TokenBucket(10, 10/60), # 10/min '/api/train-model': TokenBucket(1, 1/3600), # 1/hour}
endpoint = request.pathif not endpoint_limits[endpoint].is_allowed(user_id): return "Rate limit exceeded", 429Use case: Global APIs that want to distribute load or comply with regional regulations.
Example: A content delivery API:
Real-world scenario: A global news aggregator API serves different regions. Traffic from high-traffic regions (US/EU) gets higher limits, while emerging markets get lower limits initially, scaling up as infrastructure grows.
# Regional limitsregional_limits = { 'us': TokenBucket(1000, 1000), 'eu': TokenBucket(1000, 1000), 'asia': TokenBucket(500, 500), 'other': TokenBucket(100, 100),}
region = get_region_from_ip(request.remote_addr)if not regional_limits[region].is_allowed(request.remote_addr): return "Rate limit exceeded", 429Use case: Adjust limits based on system load or user behavior.
Example: During normal load, users get 100 requests/minute. During high load, limits drop to 50 requests/minute to protect the system. Trusted users (good history) get 150 requests/minute.
Real-world scenario: A ride-sharing API dynamically adjusts limits. During rush hour (high load), all users get reduced limits. During off-peak hours, limits increase. Users with good payment history get higher limits.
def get_dynamic_limit(user_id): base_limit = 100 # requests/minute
# Adjust based on system load current_load = get_system_load() if current_load > 0.8: # High load base_limit = base_limit * 0.5 # Reduce to 50
# Adjust based on user trust user_trust_score = get_user_trust_score(user_id) if user_trust_score > 0.9: # Trusted user base_limit = base_limit * 1.5 # Increase to 75-150
return TokenBucket(int(base_limit), base_limit/60)
limiter = get_dynamic_limit(user_id)if not limiter.is_allowed(user_id): return "Rate limit exceeded", 429Inform clients about rate limits:
HTTP/1.1 200 OKX-RateLimit-Limit: 100X-RateLimit-Remaining: 95X-RateLimit-Reset: 1640995200When limit exceeded:
HTTP/1.1 429 Too Many RequestsX-RateLimit-Limit: 100X-RateLimit-Remaining: 0X-RateLimit-Reset: 1640995200Retry-After: 60| Algorithm | Bursts | Accuracy | Memory | Complexity |
|---|---|---|---|---|
| Token Bucket | Yes | High | Low | Medium |
| Leaky Bucket | No | High | Medium | Medium |
| Sliding Window | No | Very High | High | High |
| Fixed Window | Yes | Medium | Low | Low |
Recommendation: Use Token Bucket for most cases. It’s simple, accurate, and allows bursts.
🪣 Token Bucket: Most Popular
Token bucket allows bursts, smooths to average rate. Most widely used algorithm.
📊 Sliding Window: Most Accurate
Sliding window is most accurate but uses more memory. Use when accuracy is critical.
🌐 Distributed: Use Redis
For multiple servers, use Redis with Lua scripts for atomic operations.
🔢 Return 429
When rate limit exceeded, return HTTP 429 with Retry-After header. Inform clients about limits.