Skip to main content

MCP Bidirectional Traffic: Fixing SSE Buffering and Rate Limits

· 4 min read

Published: 2025-01-16


When the LLM Council reviewed our MCP client proxy (ADR-040), they identified a critical gap: our nginx configuration was buffering SSE responses, causing tool execution to hang. Additionally, our standard API rate limits (60 req/min) were breaking MCP negotiation, which is inherently chatty.

This post details how Issue #460 fixed these SSE streaming issues.

The Problem

Problem 1: nginx Buffering Blocks SSE

Default nginx proxy configuration buffers responses:

# Default behavior (problematic for SSE)
location /api/ {
proxy_pass http://backend;
# proxy_buffering is ON by default!
}

When SSE events are buffered, they arrive in bursts instead of real-time. For MCP:

  • Tool execution appears to hang for seconds
  • Timeouts during long-running operations
  • Poor user experience in Claude Desktop

Problem 2: Rate Limits Break MCP Negotiation

MCP protocol is chatty during initialization:

  • Capabilities exchange
  • Tool listing
  • Prompt listing
  • Resource queries

Our standard 60 req/min limit triggered during normal MCP negotiation, causing connection failures.

The Solution

1. nginx SSE Location Block

Added dedicated location block for MCP SSE endpoints:

# MCP SSE endpoint - MUST be before generic /api/
location ~ ^/api/v1/mcp/(sse|message) {
proxy_pass ${API_PROXY_URL};
proxy_http_version 1.1;

# Disable buffering for SSE (critical for real-time events)
proxy_buffering off;
proxy_cache off;
proxy_set_header X-Accel-Buffering "no";

# Extended timeouts for long-running SSE (4 hours)
proxy_read_timeout 14400s;
proxy_send_timeout 14400s;

# Connection headers for SSE
proxy_set_header Connection '';
chunked_transfer_encoding on;
}

Key configurations:

  • proxy_buffering off: Events stream immediately
  • X-Accel-Buffering: no: Header for upstream servers
  • 14400s timeouts: 4-hour sessions for long operations
  • Connection '': Prevent connection header interference

2. Split Rate Limiting

Created separate rate limiters for SSE and messages:

# SSE: Connection-based limit (5 concurrent per user)
class MCPSSERateLimiter:
def __init__(self, max_connections: int = 5):
self.max_connections = max_connections

async def acquire(self, user_id: str, connection_id: str) -> bool:
"""Acquire a connection slot."""
# Uses Redis SET for atomic connection counting

# Messages: Token bucket (200 req/min, 50 burst)
class MCPMessageRateLimiter:
def __init__(self, rate_limit: int = 200, burst_limit: int = 50):
self.rate_limit = rate_limit
self.burst_limit = burst_limit

async def check(self, user_id: str) -> bool:
"""Check if message is allowed."""
# Uses Redis token bucket algorithm

Why split limits?

EndpointLimit TypeValueReason
SSE /sseConcurrent5 per userLong-lived connections, prevent resource exhaustion
POST /message/{id}Token bucket200/min, 50 burstHandle chatty negotiation, allow burst

3. Extended Session TTL

MCP sessions now have 4-hour TTL to match nginx timeouts:

# backend/api/v1/mcp_sse.py
MCP_SESSION_TTL_SECONDS = 14400 # 4 hours
SSE_READ_TIMEOUT_SECONDS = 14400 # Matches nginx config

Implementation Details

Rate Limiter Storage

Uses Redis DB 4 (separate from API rate limiting DB 3):

MCP_RATE_LIMIT_DB = 4

# SSE: Uses Redis SET for connection tracking
key = f"mcp_sse:{user_id}:connections"
# SET contains active connection_ids

# Messages: Uses Redis HASH for token bucket
key = f"mcp_msg:{user_id}"
# HASH contains {tokens: N, last_refill: timestamp}

Rate Limiter Release

Crucial: Release SSE slot when connection closes:

async def remove_connection(self, connection_id: str):
# ... disconnect logic ...

# Release the SSE rate limit slot
user_key = connection.user_info.email
await sse_limiter.release(user_key, connection_id)

Without this, users would exhaust their connection limit and be unable to reconnect.

Impact

MetricBeforeAfter
SSE event latencyBuffered (seconds)Real-time (<100ms)
MCP negotiationOften rate limitedReliable
Session duration30 minutes4 hours
Concurrent connectionsNo limit5 per user
ADR-040 verdictCONDITIONALAPPROVED

Lessons Learned

  1. SSE needs special handling: Standard proxy configs don't work for SSE
  2. Different endpoints, different limits: API rate limits don't fit all protocols
  3. Match timeouts end-to-end: nginx, backend, and client must agree
  4. Resource cleanup matters: Release rate limit slots on disconnect

Issue #460 | ADR-040 | LLM Council Blocking Issue Resolved