ADR-004: RESTful API Design
Status
Implemented
Date
2025-01-16 (Retrospective)
Decision Makers
- Architecture Team - API design principles
- Frontend Team - Consumer requirements
Layer
API
Related ADRs
- ADR-008: FastAPI with Pydantic (implementation framework)
- ADR-030: API Versioning (version strategy)
- ADR-031: Request Validation (validation approach)
Supersedes
None
Depends On
None
Context
The SRE Operations Platform requires an API design that supports:
- Multiple Clients: Web UI, MCP server, external integrations
- CRUD Operations: Standard create, read, update, delete for 17 entity types
- Bulk Operations: Efficient handling of multiple entities
- Search & Filter: Complex queries across entities
- Pagination: Handling large result sets
- Documentation: Self-documenting for consumers
Key constraints:
- Must work with React Query caching
- Need consistent error handling
- Support OpenAPI specification generation
- Enable versioning for future evolution
Decision
We adopt RESTful API design with the following conventions:
Key Design Decisions
- Resource-Based URLs:
/api/v1/{entity_type}for collections - HTTP Verbs: GET (read), POST (create), PUT (update), DELETE (remove)
- Consistent Response Format:
{ items: T[], total: number }for lists - Query Parameters: Pagination, sorting, filtering via query string
- OpenAPI-First: Full OpenAPI 3.0 specification
URL Structure
GET /api/v1/requirements # List all
GET /api/v1/requirements/{id} # Get one
POST /api/v1/requirements # Create
PUT /api/v1/requirements/{id} # Update
DELETE /api/v1/requirements/{id} # Delete
POST /api/v1/requirements/bulk-delete # Bulk operation
Response Formats
List Response:
{
"items": [...],
"total": 100
}
Single Entity Response:
{
"id": "REQ-000001",
"title": "...",
...
}
Error Response:
{
"detail": "Error message",
"status_code": 400
}
Validation Error (422):
{
"detail": [
{
"loc": ["body", "title"],
"msg": "field required",
"type": "value_error.missing"
}
]
}
Query Parameters
| Parameter | Purpose | Example |
|---|---|---|
skip | Pagination offset | ?skip=20 |
limit | Page size | ?limit=50 |
sort_by | Sort field | ?sort_by=created_at |
sort_order | Sort direction | ?sort_order=desc |
search | Text search | ?search=authentication |
status | Filter by status | ?status=Active |
type | Filter by type | ?type=Functional |
group_by | Group results | ?group_by=category |
Consequences
Positive
- Predictable: Developers know URL patterns without docs
- Cacheable: GET requests cache effectively with React Query
- Tooling Support: OpenAPI enables client generation
- Browser Friendly: Standard HTTP semantics
- Debugging: Easy to test with curl, Postman
- Industry Standard: Low learning curve
Negative
- Over-fetching: May return more data than needed
- Under-fetching: May require multiple requests for related data
- N+1 Queries: List endpoints may need optimization
- Limited Flexibility: Complex operations require workarounds
Neutral
- HATEOAS: Not implemented (not needed for SPA)
- GraphQL Alternative: Considered but not adopted
Alternatives Considered
1. GraphQL
- Approach: Query language with flexible data fetching
- Rejected: Added complexity, Apollo Client conflicts (see incident 2025-10-14)
2. gRPC
- Approach: Binary protocol with code generation
- Rejected: Not browser-native, limited debugging
3. JSON-RPC
- Approach: RPC-style over HTTP
- Rejected: Less tooling, unconventional
Implementation Status
- Core implementation complete
- Tests written and passing
- Documentation updated
- Migration/upgrade path defined
- Monitoring/observability in place
Implementation Details
- Route Handlers:
backend/api/v1/ - OpenAPI Schema: Auto-generated at
/docsand/openapi.json - Pagination Utils:
backend/core/pagination.py - Response Models:
backend/schemas/ - API Docs:
docs/api/
Compliance/Validation
- Automated checks: OpenAPI schema validation in CI
- Manual review: API changes reviewed for REST compliance
- Metrics: Response time and error rate per endpoint
LLM Council Review
Review Date: 2025-01-16 Confidence Level: High (100%) Verdict: CONDITIONAL APPROVAL
Quality Metrics
- Consensus Strength Score (CSS): 0.90
- Deliberation Depth Index (DDI): 0.88
Council Feedback Summary
The council approved the baseline design but identified critical flaws for high-volume SRE entities. The current pagination and update strategies will cause performance degradation in production.
Key Concerns Identified:
- Pagination is a Critical Blocker:
skip/limitis O(N) and unstable with concurrent writes - PUT-Only Updates: Dangerous in SRE context with concurrent automated updates
- Missing Bulk Operations: SRE automation needs to update hundreds of entities at once
- No Nested Resources: SRE data is rarely flat (incidents → timeline, services → alerts)
Required Modifications:
- Hybrid Pagination Strategy:
- Cursor-based (keyset): Mandatory for high-volume data (Alerts, Audit Logs, Metrics)
- Offset-based: Only for low-cardinality config data (Users, Teams, Runbooks)
- Add PATCH Immediately: For partial updates (e.g., changing status to "Resolved")
- Implement Optimistic Concurrency: Use
ETagandIf-Matchheaders - Standardize Bulk Operations:
- Sync:
POST /api/v1/alerts/bulk(list of IDs + action) - Async:
POST .../bulk-asyncreturning 202 + Job ID
- Sync:
- Update Response Envelope:
{
"items": [...],
"meta": {
"total": 100,
"next_cursor": "abc...",
"has_more": true
}
} - Error Handling: Adopt RFC 7807 (Problem Details for HTTP APIs)
- Idempotency: Mandate
Idempotency-Keyheaders on POST/PATCH
Modifications Applied
- Documented hybrid pagination strategy (cursor + offset)
- Added PATCH for partial updates
- Defined bulk operation patterns (sync/async)
- Added ETag-based optimistic concurrency recommendation
- Documented RFC 7807 error format
Council Ranking
- gpt-5.2: Best Response (pagination/bulk focus)
- gemini-3-pro: Strong (idempotency emphasis)
- claude-opus-4.5: Good (PATCH/PUT distinction)
- grok-4.1: Partial
Operational Guidelines (APPROVED_WITH_MODS)
HATEOAS Links for Navigation
Response Structure with Links:
{
"items": [...],
"total": 150,
"_links": {
"self": { "href": "/api/v1/requirements?skip=20&limit=20" },
"first": { "href": "/api/v1/requirements?skip=0&limit=20" },
"prev": { "href": "/api/v1/requirements?skip=0&limit=20" },
"next": { "href": "/api/v1/requirements?skip=40&limit=20" },
"last": { "href": "/api/v1/requirements?skip=140&limit=20" }
}
}
Entity Response with Related Links:
{
"id": "REQ-000001",
"title": "User Authentication",
"_links": {
"self": { "href": "/api/v1/requirements/REQ-000001" },
"capabilities": { "href": "/api/v1/requirements/REQ-000001/capabilities" },
"test_cases": { "href": "/api/v1/requirements/REQ-000001/test-cases" },
"history": { "href": "/api/v1/requirements/REQ-000001/history" },
"parent": { "href": "/api/v1/capabilities/CAP-000005" }
}
}
Implementation:
# backend/schemas/base.py
class HALLinks(BaseModel):
"""HATEOAS link structure."""
href: str
method: str = "GET"
title: str | None = None
class PaginatedResponse(BaseModel, Generic[T]):
"""Paginated response with HATEOAS links."""
items: list[T]
total: int
_links: dict[str, HALLinks] | None = None
@classmethod
def create(cls, items, total, skip, limit, base_url):
links = {
"self": HALLinks(href=f"{base_url}?skip={skip}&limit={limit}"),
"first": HALLinks(href=f"{base_url}?skip=0&limit={limit}"),
}
if skip > 0:
links["prev"] = HALLinks(href=f"{base_url}?skip={max(0, skip-limit)}&limit={limit}")
if skip + limit < total:
links["next"] = HALLinks(href=f"{base_url}?skip={skip+limit}&limit={limit}")
return cls(items=items, total=total, _links=links)
Standardized Error Response Format
RFC 7807 Problem Details:
{
"type": "https://api.ops.example.com/errors/validation-error",
"title": "Validation Error",
"status": 422,
"detail": "Request validation failed",
"instance": "/api/v1/requirements",
"errors": [
{
"field": "title",
"message": "Title is required",
"code": "required"
},
{
"field": "priority",
"message": "Must be one of: low, medium, high, critical",
"code": "invalid_enum"
}
],
"trace_id": "abc123def456"
}
Error Type Registry:
| Type | Status | Description |
|---|---|---|
| validation-error | 422 | Request body validation failed |
| not-found | 404 | Resource not found |
| unauthorized | 401 | Authentication required |
| forbidden | 403 | Insufficient permissions |
| conflict | 409 | Resource conflict (duplicate, version) |
| rate-limited | 429 | Too many requests |
| internal-error | 500 | Server error |
Implementation:
# backend/core/exceptions.py
class ProblemDetail(BaseModel):
type: str = "about:blank"
title: str
status: int
detail: str
instance: str | None = None
errors: list[dict] | None = None
trace_id: str | None = None
@app.exception_handler(RequestValidationError)
async def validation_exception_handler(request, exc):
return JSONResponse(
status_code=422,
content=ProblemDetail(
type="https://api.ops.example.com/errors/validation-error",
title="Validation Error",
status=422,
detail="Request validation failed",
instance=str(request.url.path),
errors=[{"field": e["loc"][-1], "message": e["msg"]} for e in exc.errors()],
trace_id=request.state.trace_id,
).model_dump(),
media_type="application/problem+json",
)
References
- RESTful API Design Best Practices
- OpenAPI 3.0 Specification
- RFC 7807 - Problem Details
- HAL - Hypertext Application Language
- Industry patterns: GitHub API, Stripe API
ADR-004 | API Layer | Implemented | APPROVED_WITH_MODS Completed