Skip to main content

ADR-008: FastAPI with Pydantic

Status

Implemented

Date

2025-01-16 (Retrospective)

Decision Makers

  • Backend Team - Framework selection
  • Architecture Team - API design patterns

Layer

API

  • ADR-004: RESTful API Design (API patterns)
  • ADR-001: PostgreSQL with pgvector (database layer)
  • ADR-009: OpenTelemetry Instrumentation (observability)

Supersedes

None

Depends On

None

Context

The SRE Operations Platform backend requires a Python web framework:

  1. API Performance: Low latency for real-time dashboards
  2. Type Safety: Reduce bugs, improve developer experience
  3. Documentation: Auto-generated OpenAPI specs
  4. Async Support: Handle concurrent requests efficiently
  5. Ecosystem: Rich Python ecosystem for AI/ML features

Key constraints:

  • Must support both sync and async operations
  • Need excellent OpenAPI generation
  • Require dependency injection patterns
  • Must integrate with SQLAlchemy
  • Need production-grade performance

Decision

We adopt FastAPI with Pydantic v2 as the backend framework:

Key Design Decisions

  1. FastAPI: Modern async Python framework
  2. Pydantic v2: Fast data validation with Rust core
  3. Dependency Injection: FastAPI's Depends() pattern
  4. Auto Documentation: OpenAPI/Swagger at /docs
  5. SQLAlchemy Integration: Direct ORM access in routes

Technology Stack

# requirements.txt / pyproject.toml
fastapi = "^0.111.0"
pydantic = "^2.7.0"
uvicorn = "^0.29.0"
sqlalchemy = "^2.0.30"
python = "^3.11"

Route Pattern

from fastapi import APIRouter, Depends, HTTPException
from sqlalchemy.orm import Session

router = APIRouter(prefix="/requirements", tags=["requirements"])

@router.get("", response_model=RequirementListResponse)
def list_requirements(
skip: int = 0,
limit: int = Query(default=50, le=100),
db: Session = Depends(get_db),
user: UserInfo = Depends(get_current_user),
):
items = db.query(Requirement).offset(skip).limit(limit).all()
total = db.query(Requirement).count()
return {"items": items, "total": total}

Schema Pattern

from pydantic import BaseModel, Field

class RequirementBase(BaseModel):
title: str = Field(..., min_length=1, max_length=500)
description: str | None = None
status: RequirementStatus = RequirementStatus.DRAFT

class RequirementCreate(RequirementBase):
pass

class RequirementResponse(RequirementBase):
id: str
created_at: datetime
updated_at: datetime

class Config:
from_attributes = True # Pydantic v2

Consequences

Positive

  • Performance: One of the fastest Python frameworks
  • Type Safety: Pydantic validation catches errors early
  • Auto Documentation: OpenAPI spec generated automatically
  • Developer Experience: Excellent editor support, type hints
  • Async Native: First-class async/await support
  • Dependency Injection: Clean, testable code patterns
  • Standards Based: OpenAPI, JSON Schema compliance

Negative

  • Python Limitations: GIL limits true parallelism
  • Migration Complexity: Breaking changes between Pydantic versions
  • Learning Curve: Dependency injection patterns require understanding
  • Memory Usage: Higher than compiled languages

Neutral

  • ORM Choice: SQLAlchemy works well but has own complexity
  • Async Complexity: Mixing sync/async requires care

Alternatives Considered

1. Django REST Framework

  • Approach: Full-featured REST framework for Django
  • Rejected: Heavier, less async support, more opinionated

2. Flask + Marshmallow

  • Approach: Minimal framework with schema library
  • Rejected: No async, more boilerplate, less TypeScript-like

3. Node.js + Express

  • Approach: JavaScript backend
  • Rejected: Lose Python ML ecosystem, team expertise

4. Go + Gin

  • Approach: High-performance compiled language
  • Rejected: Lose Python ecosystem, longer development time

Implementation Status

  • Core implementation complete
  • Tests written and passing
  • Documentation updated
  • Migration/upgrade path defined
  • Monitoring/observability in place

Implementation Details

  • Main App: backend/main.py
  • Route Handlers: backend/api/v1/
  • Schemas: backend/schemas/
  • Dependencies: backend/core/dependencies.py
  • Config: backend/core/config.py
  • OpenAPI: /docs, /openapi.json

Compliance/Validation

  • Automated checks: Pydantic validates all requests
  • Manual review: New endpoints reviewed for patterns
  • Metrics: Request latency, validation error rates

LLM Council Review

Review Date: 2025-01-16 Confidence Level: High (100%) Verdict: APPROVED WITH CRITICAL CAVEATS

Quality Metrics

  • Consensus Strength Score (CSS): 0.95
  • Deliberation Depth Index (DDI): 0.90

Council Feedback Summary

The council strongly approved FastAPI as the right choice for an SRE platform (I/O-bound orchestration), but identified blocking the event loop as the highest technical risk.

Key Concerns Identified:

  1. GIL and CPU Operations: The GIL is irrelevant for I/O but blocks heavy compute (log parsing, anomaly detection)
  2. Async/Sync Mixing (Critical): Using boto3 or requests in async def blocks entire event loop
  3. Pydantic v2 Migration: Breaking syntax changes require careful attention

Required Modifications:

  1. Mandate Async Drivers:
    • Database: Use asyncpg with SQLAlchemy's AsyncSession (not psycopg2)
    • HTTP: Ban requests library; standardize on httpx AsyncClient
  2. Boto3 Strategy (AWS SDK is blocking):
    • Option A: Use aioboto3 wrapper
    • Option B (Preferred): Keep boto3 but run in threadpool: await run_in_threadpool(boto_function)
  3. Deployment Configuration: Run Uvicorn with --workers N or behind Gunicorn for multi-core utilization
  4. Blocking Detector: Implement middleware to detect event loop blocks (>100ms) in development
  5. CPU Offload: Heavy compute tasks must go to background workers (Celery/Arq) or ProcessPoolExecutor

Modifications Applied

  1. Documented async driver requirements (asyncpg, httpx)
  2. Added boto3 threadpool strategy for AWS calls
  3. Documented multi-worker deployment pattern
  4. Added event loop blocking detection recommendation
  5. Documented CPU-bound task offloading pattern

Council Ranking

  • All models reached consensus
  • gpt-5.2: Best Response (async/sync analysis)
  • claude-opus-4.5: Strong (boto3 strategy)

References


ADR-008 | API Layer | Implemented