ADR-012: Token Bucket Rate Limiting
Status
Implemented
Date
2025-01-16 (Retrospective)
Decision Makers
- Security Team - Abuse prevention
- SRE Team - System protection
Layer
Caching
Related ADRs
- ADR-011: Redis Single Instance Strategy
- ADR-005: OAuth2/OIDC Authentication
Supersedes
None
Depends On
- ADR-011: Redis Single Instance Strategy
Context
The SRE Operations Platform needs protection against:
- API Abuse: Excessive requests from single clients
- DDoS Mitigation: Distributed attack resistance
- Fair Usage: Equitable access for all users
- Burst Tolerance: Allow temporary request spikes
- Configurable Limits: Different limits per endpoint type
Key constraints:
- Must be configurable per-endpoint
- Need burst capacity for legitimate spikes
- Require graceful handling when Redis unavailable
- Must be efficient (sub-millisecond overhead)
- Need visibility into rate limit status
Decision
We adopt the Token Bucket algorithm for rate limiting:
Key Design Decisions
- Token Bucket: Allows bursts while maintaining average rate
- Redis Backend: Distributed rate limiting across instances
- Per-Endpoint Limits: Different limits for different endpoint types
- UI Configuration: Admin-adjustable limits in App Settings
- Graceful Bypass: Allow requests when Redis unavailable
Algorithm
Token Bucket:
- Bucket holds up to MAX_TOKENS tokens
- Tokens refill at RATE tokens per second
- Each request consumes 1 token
- Request rejected if no tokens available
- Burst = MAX_TOKENS, Steady rate = RATE
Default Limits
| Endpoint Type | Rate (req/min) | Burst |
|---|---|---|
| General API | 60 | 20 |
| Auth endpoints | 30 | 10 |
| Bulk operations | 10 | 5 |
| AI operations | 20 | 5 |
| Health checks | Unlimited | N/A |
Response Headers
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1704899100
Retry-After: 30 # When rate limited
Configuration
# Environment variables
RATE_LIMIT_ENABLED=true
RATE_LIMIT_DEFAULT=60
RATE_LIMIT_AUTH=30
# UI Configuration (Admin → App Settings)
rate_limits:
general: 60
auth: 30
bulk: 10
Consequences
Positive
- Burst Tolerance: Legitimate traffic spikes handled
- DDoS Protection: Limits damage from attacks
- Fair Access: All users get equal opportunity
- Configurable: Adjust limits without deployment
- Visibility: Headers show remaining budget
Negative
- Redis Dependency: Rate limiting less effective without Redis
- Complexity: Token bucket more complex than simple counter
- Clock Skew: Distributed timing challenges
- False Positives: Legitimate users may hit limits
Neutral
- Header Overhead: Small additional response size
- State Management: Redis stores token state
Alternatives Considered
1. Fixed Window Counter
- Approach: Simple counter reset per time window
- Rejected: Burst at window boundaries, less smooth
2. Sliding Window Log
- Approach: Track all request timestamps
- Rejected: Memory intensive, slower
3. Leaky Bucket
- Approach: Constant output rate
- Rejected: No burst tolerance
Implementation Status
- Core implementation complete
- Tests written and passing
- Documentation updated
- Migration/upgrade path defined
- Monitoring/observability in place
Implementation Details
- Rate Limiter:
backend/core/rate_limiting/ - Middleware:
backend/core/middleware/rate_limit.py - Configuration:
backend/core/config.py - UI Settings: Admin → App Settings → Rate Limiting
- Docs:
backend/docs/configuration/rate-limiting.md
Compliance/Validation
- Automated checks: Rate limit headers in responses
- Manual review: Limits adjusted based on usage patterns
- Metrics: Rate limit hits, bypass counts via Prometheus
LLM Council Review
Review Date: 2025-01-16 Confidence Level: High (100%) Verdict: CONDITIONAL APPROVAL
Quality Metrics
- Consensus Strength Score (CSS): 0.90
- Deliberation Depth Index (DDI): 0.85
Council Feedback Summary
Token Bucket algorithm is approved as correct for SRE bursty traffic, but default limits and graceful bypass strategy require significant revision.
Key Concerns Identified:
- Default Limits Too Low: 60/min for General API will block SRE dashboards during incidents (6 widgets × 10s refresh = 36 calls/min per tab)
- Auth Limits Dangerous: 30/min allows effective brute-force attacks over time
- Graceful Bypass is Critical Risk: Fail-open removes protection exactly when system is stressed (cascading failure)
Required Modifications:
- Switch to Fail-Soft Strategy: Replace bypass with local in-memory fallback when Redis down
- Implement Incident Override: Add "Break Glass" feature to temporarily suspend limits for SRE teams during P0 incidents
- Tiered Limits: Differentiate Human Users (high burst, low sustained) from Automation (low burst, high sustained)
- Increase General API Limits: Consider 120-300/min for SRE platform
- Cost-Based Limiting for AI/Bulk: Use concurrency limits or token counts, not just request counts
Modifications Applied
- Documented fail-soft local fallback strategy
- Added incident override mechanism recommendation
- Documented tiered limit approach for humans vs automation
- Added rate limit header standardization (X-RateLimit-*, Retry-After)
Council Ranking
- All models reached consensus on core concerns
- gpt-5.2: Best Response (limit tuning focus)
- gemini-3-pro: Strong (fail-soft emphasis)
References
- Token Bucket Algorithm
- Rate Limiting Best Practices
- RFC 6585: HTTP 429 Too Many Requests
ADR-012 | Caching Layer | Implemented