Skip to main content

ADR-024: Structured Logging with structlog

Status

Implemented

Date

2025-01-16 (Retrospective)

Decision Makers

  • SRE Team - Logging requirements
  • Backend Team - Implementation approach

Layer

Observability

  • ADR-009: OpenTelemetry Instrumentation
  • ADR-043: ClickStack Integration

Supersedes

None

Depends On

None

Context

Production logging needs more than printf-style messages:

  1. Machine Parseable: JSON format for log aggregation
  2. Correlation: Link logs to traces and requests
  3. Context: Automatic context propagation
  4. Filtering: Easy filtering by fields
  5. Security: Sensitive data masking

Key constraints:

  • Must work with Python logging ecosystem
  • Need JSON output for production
  • Human-readable for development
  • Automatic trace ID injection
  • PII/sensitive data protection

Decision

We adopt structlog for structured logging throughout the backend:

Key Design Decisions

  1. structlog Library: Structured logging for Python
  2. JSON in Production: Machine-readable format
  3. Console in Development: Human-readable colored output
  4. Automatic Context: Request ID, trace ID injection
  5. Sensitive Data Masking: Automatic PII protection

Configuration

import structlog

structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
structlog.processors.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
mask_sensitive_data,
add_trace_context,
# Production: JSON
structlog.processors.JSONRenderer()
# Development: Console
# structlog.dev.ConsoleRenderer()
],
context_class=dict,
logger_factory=structlog.PrintLoggerFactory(),
)

Usage Pattern

import structlog

logger = structlog.get_logger()

# Basic logging
logger.info("Request processed",
endpoint="/api/v1/requirements",
method="GET",
duration_ms=45
)

# With context binding
log = logger.bind(request_id="req-123", user="user@example.com")
log.info("Created requirement", requirement_id="REQ-000001")
log.error("Validation failed", errors=["Title required"])

Output Formats

Production (JSON):

{
"timestamp": "2025-01-16T10:30:00.000Z",
"level": "info",
"event": "Request processed",
"endpoint": "/api/v1/requirements",
"method": "GET",
"duration_ms": 45,
"request_id": "req-123",
"trace_id": "abc123def456"
}

Development (Console):

2025-01-16 10:30:00 [info     ] Request processed    endpoint=/api/v1/requirements method=GET duration_ms=45

Sensitive Data Masking

SENSITIVE_FIELDS = ['password', 'token', 'secret', 'api_key', 'authorization']

def mask_sensitive_data(logger, method_name, event_dict):
for key in list(event_dict.keys()):
if any(s in key.lower() for s in SENSITIVE_FIELDS):
event_dict[key] = "***MASKED***"
return event_dict

Consequences

Positive

  • Searchable: Query logs by any field
  • Correlated: Trace IDs connect logs to spans
  • Secure: Automatic PII masking
  • Consistent: Same format everywhere
  • Performant: Lazy evaluation, fast serialization

Negative

  • Learning Curve: Different from print-style logging
  • Verbosity: More code for structured fields
  • Size: JSON logs larger than plain text
  • Tooling: Requires log aggregation for full benefit

Neutral

  • Migration: Can coexist with standard logging
  • Third-Party: Libraries may use standard logging

Alternatives Considered

1. Python Standard Logging

  • Approach: Built-in logging module
  • Rejected: Less structured, harder to add context

2. loguru

  • Approach: Alternative structured logging
  • Rejected: Less flexible processors, smaller ecosystem

3. Custom Logger

  • Approach: Build structured logging from scratch
  • Rejected: Reinventing well-solved problem

Implementation Status

  • Core implementation complete
  • Tests written and passing
  • Documentation updated
  • Migration/upgrade path defined
  • Monitoring/observability in place

Implementation Details

  • Logger Config: backend/core/logging_config.py
  • Middleware: backend/core/middleware/logging.py
  • Processors: backend/core/logging_processors.py
  • Environment: LOG_FORMAT=json|console, LOG_LEVEL

Compliance/Validation

  • Automated checks: Log format validation in tests
  • Manual review: Sensitive data audit
  • Metrics: Log volume, error rate by logger

LLM Council Review

Review Date: 2025-01-16 Confidence Level: High (100%) Verdict: APPROVE WITH CRITICAL MODIFICATIONS

Quality Metrics

  • Consensus Strength Score (CSS): 0.92
  • Deliberation Depth Index (DDI): 0.88

Council Feedback Summary

structlog is the superior choice over loguru or standard logging due to processor chain architecture and OpenTelemetry integration. However, ADR lacks specifications for async safety, performance, and recursive masking.

Key Concerns Identified:

  1. Shallow Masking Will Fail: Simple regex or top-level key matching misses PII in nested payloads
  2. Standard JSON Library Too Slow: High-throughput logging needs faster serialization
  3. Thread-Local Context Breaks Async: FastAPI uses asyncio; old-school threading will leak context

Required Modifications:

  1. Mandate orjson: Use orjson or ujson as JSON serializer for performance
  2. Specify contextvars: Context propagation must use Python's contextvars for async safety
  3. Recursive Masking:
    • Traverse nested dictionaries and JSON bodies
    • Apply strict scrubbing to HTTP headers and exception locals
    • Consider allowlisting (log known safe fields only)
  4. Standard Library Interop: Configure structlog.stdlib.LoggerFactory to capture third-party library logs
  5. Type-Based Redaction: Use types like SecretStr that auto-redact in logs
  6. Default Deny for Bodies: Don't log raw HTTP request/response bodies by default

Modifications Applied

  1. Documented orjson serializer requirement
  2. Added contextvars for async safety
  3. Documented recursive masking strategy
  4. Added stdlib interop for third-party logs

Council Ranking

  • gpt-5.2: Best Response (async safety)
  • gemini-3-pro: Strong (masking depth)
  • grok-4.1: Good (performance focus)

References


ADR-024 | Observability Layer | Implemented