Skip to main content

ADR-009: Spec/Ledger Reconciliation System

Status: Accepted Date: 2026-01-16 Decision Makers: Development Team Owners: @christopherjoseph Version: 1.0

Decision Summary

Implement a bidirectional requirement traceability system using a YAML-based ledger that maps ADR requirements to test files, with CI enforcement to ensure coverage thresholds are maintained.

AspectDecision
FormatYAML ledger + Markdown requirement docs
ValidationPython script with bidirectional checks
CI ModeWarning mode initially, blocking after 90% coverage
Exit Criteria90%+ requirements have mapped tests

Context

The Problem: Untraceable Requirements

ADRs contain testable requirements expressed in prose, but there's no machine-readable mapping between requirements and their implementing tests:

  1. No traceability: Can't verify which tests cover which requirements
  2. Coverage gaps: No visibility into untested requirements
  3. Orphaned tests: Tests without corresponding requirements
  4. Regression risk: ADR changes may invalidate existing tests
  5. Onboarding friction: New contributors can't assess test coverage

Requirements Inventory

Analysis of existing ADRs reveals ~110 testable requirements:

ADRDomainEstimated Requirements
ADR-001MCP/Version Guard8+
ADR-002Workflow6+
ADR-003Memory40+
ADR-004Monetization5+
ADR-005Extensions10+
ADR-006Chatbot25+
ADR-007Integration15+
Total~110+

Industry Precedent

ApproachUsed ByProsCons
YAML LedgerInternal toolsSimple, versionableManual maintenance
Requirements.txt + Tagspytest-bddNative pytestScattered across files
Traceability MatrixEnterpriseComplete visibilityHeavy tooling
Doctest-stylePython stdlibInline with docsLimited scope

Decision: YAML ledger provides the best balance of simplicity, visibility, and CI integration.


Decision

Ledger Domain Structure

Requirements are organized by domain with standardized prefixes:

DomainPrefixDescriptionPrimary Source
MCPREQ-MCP-NNNMCP server behavior, version guardADR-001, ADR-003
MEMREQ-MEM-NNNMemory storage, retrieval, lifecycleADR-003
EXTREQ-EXT-NNNExtension protocols, registryADR-005
BOTREQ-BOT-NNNChatbot adapters, gatewayADR-006
SECREQ-SEC-NNNSecurity, ACLs, rate limitingADR-006, ADR-007
WKFREQ-WKF-NNNWorkflow integrationADR-002
INTREQ-INT-NNNCross-ADR integrationADR-007

Negative obligations use NEG- prefix: NEG-MCP-NNN, NEG-SEC-NNN, etc.

File Structure

spec/
├── requirements/
│ ├── mcp.md # MCP server requirements
│ ├── memory.md # Memory system requirements
│ ├── extensions.md # Extension system requirements
│ ├── chatbot.md # Chatbot integration requirements
│ ├── security.md # Security requirements
│ ├── workflow.md # Workflow requirements
│ └── integration.md # Cross-ADR requirements
├── ledger.yml # Obligation-to-test mapping
└── validation/
└── reconcile.py # Bidirectional validation script

Requirement Document Format

# spec/requirements/mcp.md

## MCP Server Requirements

### REQ-MCP-001: Version Mismatch Exit Code
**Source**: ADR-001 Layer 3
**Status**: Active
**Priority**: Critical

The version guard MUST exit with code 78 (EX_CONFIG) when a Python version
mismatch is detected between the current interpreter and the stored marker.

**Rationale**: Exit code 78 follows sysexits.h convention for configuration
errors, enabling proper error handling by supervisors.

**Test Mapping**:
- `tests/test_version_guard.py::test_version_mismatch_exit_code`
- `tests/test_version_guard.py::test_version_mismatch_message`

---

### NEG-MCP-001: No Warning-Only Mode
**Source**: ADR-001 Layer 3
**Status**: Active
**Priority**: Critical

The version guard MUST NOT operate in warning-only mode. The silent segfault
failure mode requires hard exit.

**Test Mapping**:
- `tests/test_version_guard.py::test_no_warning_only_mode`

Ledger Schema

# spec/ledger.yml
version: "1.0"

domains:
- mcp
- memory
- extensions
- chatbot
- security
- workflow
- integration

requirements:
REQ-MCP-001:
title: "Version Mismatch Exit Code"
source: "ADR-001 Layer 3"
status: active # active | deprecated | proposed
priority: critical # critical | high | medium | low
tests:
- tests/test_version_guard.py::test_version_mismatch_exit_code
- tests/test_version_guard.py::test_version_mismatch_message

NEG-MCP-001:
title: "No Warning-Only Mode"
source: "ADR-001 Layer 3"
status: active
priority: critical
tests:
- tests/test_version_guard.py::test_no_warning_only_mode

# ... additional requirements

Validation Script

# spec/validation/reconcile.py
"""
Bidirectional reconciliation between spec/ledger.yml and actual tests.

Checks:
1. All active requirements have mapped tests
2. All mapped test files exist
3. Coverage meets threshold (90%)

Exit Codes:
0: Reconciliation passed
1: Reconciliation failed
2: Configuration error
"""

def reconcile(ledger_path, project_root, verbose=False):
# Load ledger
ledger = yaml.safe_load(ledger_path.read_text())

# Find existing test files
test_files = set(project_root.glob("tests/**/test_*.py"))

# Validate each requirement
for req_id, req_data in ledger["requirements"].items():
if req_data["status"] != "active":
continue

# Check tests exist
for test_path in req_data.get("tests", []):
file_path = test_path.split("::")[0]
if file_path not in test_files:
report_missing(req_id, file_path)

# Calculate coverage
coverage = with_tests / active_requirements
return coverage >= 0.90

CI Integration

# .github/workflows/reconcile.yml
name: Spec/Ledger Reconciliation

on:
push:
paths: ['spec/**', 'tests/**']
pull_request:
paths: ['spec/**', 'tests/**']

jobs:
reconcile:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
- run: pip install pyyaml

# Warning mode (always passes)
- run: python spec/validation/reconcile.py --warn --verbose

# Strict mode (informational for now)
- run: python spec/validation/reconcile.py --verbose
continue-on-error: true

Implementation Phases

Phase 1: Scaffolding (Complete)

  • Create spec/requirements/ structure
  • Create domain requirement files with initial requirements
  • Create spec/ledger.yml with 89 requirements
  • Create spec/validation/reconcile.py
  • Create .github/workflows/reconcile.yml

Phase 2: Coverage Improvement (Complete)

  • Map existing tests to requirements (reduced from 77 to 0 missing)
  • Fix path mismatches in ledger.yml (~70 corrections)
  • Create missing test files (tests/test_workflow.py, tests/test_integration.py)
  • Achieve 100% coverage (exceeds 90% threshold)
  • Add pytest markers infrastructure (tests/conftest.py)
  • Register custom markers in pyproject.toml

Phase 3: Enforcement (Complete)

  • Implement priority-aware coverage thresholds in reconcile.py
  • Implement baseline ratchet mechanism (check_ratchet(), BaselineSchema)
  • Add domain coverage tracking per Council recommendation
  • Create GitHub issue templates for test gaps
  • Enable blocking mode in CI (deferred - optional)
  • Add pre-commit hook for local validation (deferred - optional)

Consequences

Positive

  • Traceability: Every requirement maps to tests (and vice versa)
  • Gap Detection: CI surfaces untested requirements immediately
  • Maintenance: ADR changes prompt requirement updates
  • Onboarding: New contributors understand what's tested
  • Compliance: Audit trail for requirement coverage

Negative

  • Initial Effort: Extracting ~110 requirements requires review
  • Maintenance Overhead: New ADRs must include requirements
  • CI Time: Reconciliation adds ~10s to build time
  • Discipline: Team must maintain ledger alongside code

Mitigations

  • Start in warning-only mode to build coverage gradually
  • Add requirement template to ADR template
  • Cache reconciliation results
  • Provide clear contribution guidelines

Verification

# Run reconciliation
python spec/validation/reconcile.py --verbose

# Expected output (Phase 2/3 complete):
# ============================================================
# SPEC/LEDGER RECONCILIATION REPORT
# ============================================================
#
# Requirements:
# Total: 89
# Active: 87
# With tests: 87
# Without tests: 0
# Coverage: 100.0%
#
# Coverage by Priority:
# Critical 100.0% (threshold: 100%) [OK]
# High 100.0% (threshold: 95%) [OK]
# Medium 100.0% (threshold: 85%) [OK]
# Low 100.0% (threshold: 75%) [OK]
#
# Coverage by Domain:
# BOT 100.0%
# EXT 100.0%
# INT 80.0%
# MCP 100.0%
# MEM 100.0%
# SEC 100.0%
# WKF 100.0%
#
# ------------------------------------------------------------
# Reconciliation PASSED!
# ------------------------------------------------------------

# Run reconcile tests
pytest tests/spec/test_reconcile.py -v
# Expected: 26 passed

  • ADR-008: MkDocs Documentation Site (companion ADR)
  • ADR-001: Python Version Requirement (source of REQ-MCP-*)
  • ADR-002: Workflow Integration (source of REQ-WKF-*)
  • ADR-003: Project Intent & Memory (source of REQ-MEM-*)
  • ADR-005: Repository Organization (source of REQ-EXT-*)
  • ADR-006: Chatbot Integrations (source of REQ-BOT-*)
  • ADR-007: Cross-ADR Integration (source of REQ-INT-*)

Council Review

Review Date: 2026-01-16 Verdict: Approved with amendments Consensus Strength: 0.82

Key Recommendations

  1. Hybrid Approach: YAML + pytest Markers (High Priority)

    • YAML ledger for metadata (title, source, priority, status)
    • pytest markers for actual test linkage: @pytest.mark.requirement("REQ-MCP-001")
    • Benefits: IDE autocomplete, refactoring support, co-location
    • Action: Extend reconcile.py to scan for markers
  2. Priority-Aware Coverage Thresholds (High Priority)

    • Current 90% threshold treats all requirements equally
    • Recommended thresholds:
      • Critical: 100% (must have tests)
      • High: 95%
      • Medium: 85%
      • Low: 75%
    • Security domain (SEC-*): Always 100%
    • Action: Update reconcile.py with tiered thresholds
  3. Split Ledger by Domain (Medium Priority)

    • Single ledger.yml will cause merge conflicts at scale
    • Recommended structure:
      spec/
      ├── ledger/
      │ ├── mcp.yml
      │ ├── memory.yml
      │ ├── chatbot.yml
      │ └── ...
      └── validation/
      └── reconcile.py # Aggregates all domain ledgers
    • Action: Defer until >150 requirements
  4. Baseline Ratchet Mechanism (High Priority)

    • CI should block coverage decreases, not just threshold failures
    • Store baseline in .spec-baseline.json
    • Any PR that reduces coverage requires explicit override
    • Action: Add to Phase 3 enforcement
  5. Clarify INT Domain (Documentation)

    • Integration requirements overlap with other domains
    • Define INT as: "Cross-ADR flows requiring multiple systems"
    • Action: Update spec/requirements/integration.md
  6. Distinguish Mapping vs Execution (Critical)

    • "Has mapping" ≠ "Test exists and passes"
    • Reconciliation checks:
      • Level 1: Requirement has test path (current)
      • Level 2: Test file exists (current)
      • Level 3: Test actually passes (future)
    • Action: Document levels in ADR, implement Level 3 in Phase 3

Accepted Amendments

AmendmentStatusTarget
pytest markersAcceptedPhase 2
Priority thresholdsAcceptedPhase 2
Split ledgerDeferred>150 requirements
Baseline ratchetAcceptedPhase 3
INT clarificationAcceptedImmediate
Level 3 verificationAcceptedPhase 3

Dissenting Opinion

One council member advocated for pure pytest markers without YAML metadata, arguing the ledger adds maintenance overhead. Counter-argument: YAML provides a single source of truth for requirement inventory, independent of test implementation status.


Changelog

VersionDateChanges
1.02026-01-16Initial ADR split from ADR-008
1.12026-01-16Phase 2/3 complete: priority thresholds, baseline ratchet, domain coverage, 100% test coverage achieved