MCP Bidirectional Traffic: Fixing SSE Buffering and Rate Limits
Published: 2025-01-16
When the LLM Council reviewed our MCP client proxy (ADR-040), they identified a critical gap: our nginx configuration was buffering SSE responses, causing tool execution to hang. Additionally, our standard API rate limits (60 req/min) were breaking MCP negotiation, which is inherently chatty.
This post details how Issue #460 fixed these SSE streaming issues.
The Problem
Problem 1: nginx Buffering Blocks SSE
Default nginx proxy configuration buffers responses:
# Default behavior (problematic for SSE)
location /api/ {
proxy_pass http://backend;
# proxy_buffering is ON by default!
}
When SSE events are buffered, they arrive in bursts instead of real-time. For MCP:
- Tool execution appears to hang for seconds
- Timeouts during long-running operations
- Poor user experience in Claude Desktop
Problem 2: Rate Limits Break MCP Negotiation
MCP protocol is chatty during initialization:
- Capabilities exchange
- Tool listing
- Prompt listing
- Resource queries
Our standard 60 req/min limit triggered during normal MCP negotiation, causing connection failures.
The Solution
1. nginx SSE Location Block
Added dedicated location block for MCP SSE endpoints:
# MCP SSE endpoint - MUST be before generic /api/
location ~ ^/api/v1/mcp/(sse|message) {
proxy_pass ${API_PROXY_URL};
proxy_http_version 1.1;
# Disable buffering for SSE (critical for real-time events)
proxy_buffering off;
proxy_cache off;
proxy_set_header X-Accel-Buffering "no";
# Extended timeouts for long-running SSE (4 hours)
proxy_read_timeout 14400s;
proxy_send_timeout 14400s;
# Connection headers for SSE
proxy_set_header Connection '';
chunked_transfer_encoding on;
}
Key configurations:
proxy_buffering off: Events stream immediatelyX-Accel-Buffering: no: Header for upstream servers14400stimeouts: 4-hour sessions for long operationsConnection '': Prevent connection header interference
2. Split Rate Limiting
Created separate rate limiters for SSE and messages:
# SSE: Connection-based limit (5 concurrent per user)
class MCPSSERateLimiter:
def __init__(self, max_connections: int = 5):
self.max_connections = max_connections
async def acquire(self, user_id: str, connection_id: str) -> bool:
"""Acquire a connection slot."""
# Uses Redis SET for atomic connection counting
# Messages: Token bucket (200 req/min, 50 burst)
class MCPMessageRateLimiter:
def __init__(self, rate_limit: int = 200, burst_limit: int = 50):
self.rate_limit = rate_limit
self.burst_limit = burst_limit
async def check(self, user_id: str) -> bool:
"""Check if message is allowed."""
# Uses Redis token bucket algorithm
Why split limits?
| Endpoint | Limit Type | Value | Reason |
|---|---|---|---|
SSE /sse | Concurrent | 5 per user | Long-lived connections, prevent resource exhaustion |
POST /message/{id} | Token bucket | 200/min, 50 burst | Handle chatty negotiation, allow burst |
3. Extended Session TTL
MCP sessions now have 4-hour TTL to match nginx timeouts:
# backend/api/v1/mcp_sse.py
MCP_SESSION_TTL_SECONDS = 14400 # 4 hours
SSE_READ_TIMEOUT_SECONDS = 14400 # Matches nginx config
Implementation Details
Rate Limiter Storage
Uses Redis DB 4 (separate from API rate limiting DB 3):
MCP_RATE_LIMIT_DB = 4
# SSE: Uses Redis SET for connection tracking
key = f"mcp_sse:{user_id}:connections"
# SET contains active connection_ids
# Messages: Uses Redis HASH for token bucket
key = f"mcp_msg:{user_id}"
# HASH contains {tokens: N, last_refill: timestamp}
Rate Limiter Release
Crucial: Release SSE slot when connection closes:
async def remove_connection(self, connection_id: str):
# ... disconnect logic ...
# Release the SSE rate limit slot
user_key = connection.user_info.email
await sse_limiter.release(user_key, connection_id)
Without this, users would exhaust their connection limit and be unable to reconnect.
Impact
| Metric | Before | After |
|---|---|---|
| SSE event latency | Buffered (seconds) | Real-time (<100ms) |
| MCP negotiation | Often rate limited | Reliable |
| Session duration | 30 minutes | 4 hours |
| Concurrent connections | No limit | 5 per user |
| ADR-040 verdict | CONDITIONAL | APPROVED |
Lessons Learned
- SSE needs special handling: Standard proxy configs don't work for SSE
- Different endpoints, different limits: API rate limits don't fit all protocols
- Match timeouts end-to-end: nginx, backend, and client must agree
- Resource cleanup matters: Release rate limit slots on disconnect
Issue #460 | ADR-040 | LLM Council Blocking Issue Resolved