ADR-006: Cross-Project Documentation Aggregation

Status: Accepted 2026-01-03 Date: 2026-01-03 Decision Makers: @amiable-dev/maintainers Depends On: ADR-003 (Configuration), ADR-004 (CI/CD) Council Review: 2026-01-03 (Tier: High, Models: GPT-5.2, Claude Opus 4.5, Gemini 3 Pro, Grok 4.1)

Context

The amiable-templates site aggregates documentation from multiple template repositories. Each template repository contains its own documentation that should be pulled into the unified site. We need a system that:

Fetches documentation from configured repositories at build time
Transforms content for the unified site context
Handles caching efficiently for fast builds
Gracefully handles missing or changed content

This ADR adapts patterns from amiable-docusaurus ADR-003: Cross-Project ADR Aggregation.

Current State

No aggregation system exists. Template documentation is manually referenced via links.

Goals

Automated fetching of template documentation at build time
Consistent presentation across all template docs
Source attribution for aggregated content
Fast incremental builds via caching
Resilient to upstream changes

Non-Goals

Real-time synchronization (build-time is sufficient)
Editing aggregated content in this repository
Version history of aggregated content
Webhook-triggered updates

Decision

Implement build-time documentation aggregation using Python scripts:

1. Aggregation Script: `scripts/aggregate_templates.py`

#!/usr/bin/env python3
"""
Fetch documentation from template repositories at build time.
Reads configuration from templates.yaml.
"""

import asyncio
import json
import os
import re
from pathlib import Path
from datetime import datetime

import aiohttp
import yaml

GITHUB_RAW_BASE = "https://raw.githubusercontent.com"
GITHUB_API_BASE = "https://api.github.com"

class TemplateAggregator:
    def __init__(self, config_path: str = "templates.yaml"):
        self.config = self._load_config(config_path)
        self.cache_dir = Path(".cache/templates")
        self.output_dir = Path("docs/templates")
        self.token = os.environ.get("GITHUB_TOKEN")

    async def aggregate_all(self):
        """Fetch docs from all configured templates."""
        async with aiohttp.ClientSession() as session:
            for template in self.config.get("templates", []):
                await self.aggregate_template(session, template)

    async def aggregate_template(self, session, template):
        """Fetch and transform docs for a single template."""
        template_id = template["id"]
        owner = template["repo"]["owner"]
        repo = template["repo"]["name"]

        # Get current commit SHA
        sha = await self._get_commit_sha(session, owner, repo)
        if not sha:
            print(f"  Warning: Could not fetch SHA for {owner}/{repo}")
            return

        # Check cache
        if self._is_cached(template_id, sha):
            print(f"  Using cached content for {template_id}")
            return

        # Fetch each doc
        output_path = self.output_dir / template_id
        output_path.mkdir(parents=True, exist_ok=True)

        for doc in template["directories"]["docs"]:
            content = await self._fetch_file(
                session, owner, repo, sha, doc["path"]
            )
            if content:
                transformed = self._transform_content(
                    content, owner, repo, sha, doc["path"]
                )
                target = output_path / doc["target"]
                target.write_text(transformed)
                print(f"  Wrote {doc['target']}")

        self._update_cache(template_id, sha)

    def _transform_content(self, content, owner, repo, sha, path):
        """Transform content for unified site."""
        # Rewrite relative links to absolute GitHub URLs
        content = self._rewrite_links(content, owner, repo, sha, path)

        # Inject source attribution
        source_url = f"https://github.com/{owner}/{repo}/blob/{sha}/{path}"
        attribution = f"""
!!! info "Source"
    This documentation is from [{owner}/{repo}]({source_url}).
    Last synced: {datetime.utcnow().strftime('%Y-%m-%d')}

"""
        return attribution + content

    def _rewrite_links(self, content, owner, repo, sha, path):
        """Rewrite relative links to absolute GitHub URLs."""
        base_path = Path(path).parent

        # Rewrite images
        def replace_image(match):
            alt, src = match.groups()
            if src.startswith(('http://', 'https://')):
                return match.group(0)
            resolved = (base_path / src).as_posix()
            return f"![{alt}](https://raw.githubusercontent.com/amiable-dev/amiable-templates/0cd5d5fb7cba1d8bac3763502597ecd102d9f667/docs/adrs/{GITHUB_RAW_BASE}/{owner}/{repo}/{sha}/{resolved})"

        content = re.sub(r'!\[([^\]]*)\]\(([^)]+)\)', replace_image, content)
        return content

2. Caching Strategy

Cache structure:

.cache/templates/
├── manifest.json     # Tracks commit SHAs
└── raw/              # Cached raw content
    └── {template-id}/

Manifest format:

{
  "litellm-langfuse-starter": {
    "commit_sha": "abc123...",
    "fetched_at": "2026-01-03T10:00:00Z",
    "files": ["overview.md", "setup.md"]
  }
}

Cache invalidation:

Compare current commit SHA with cached SHA
If different, refetch all docs for that template
Daily scheduled builds ensure freshness

3. Content Transformation

Transformation	Purpose
Link rewriting	Relative → absolute GitHub URLs
Image rewriting	Point to raw.githubusercontent.com
Source attribution	Add info box with source link
Front matter	Inject MkDocs metadata

4. Error Handling

Scenario	Behavior
Template repo not found	Skip, log warning
Doc file not found	Skip file, continue others
Rate limit hit	Use cached content, warn
API error	Use cached content if available

Never fail the build due to upstream issues.

5. GitHub API Usage

Authenticated via GITHUB_TOKEN (5000 req/hr)
Use raw.githubusercontent.com for content (no rate limit)
Use Trees API for efficient directory listing
Cache reduces API calls significantly

6. Integration with CI

In .github/workflows/deploy.yml:

- name: Aggregate template docs
  env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  run: python scripts/aggregate_templates.py

- name: Build MkDocs
  run: mkdocs build --strict

7. ADR Aggregation

ADRs are fetched similarly:

Read from docs/adr/ in each template repo
Write to docs/adrs/aggregated/{template-id}/
Include in navigation under ADRs section

Consequences

Positive

Unified experience: All template docs in one place
Always fresh: Daily rebuilds catch upstream changes
Fast builds: Caching minimizes fetch time
Resilient: Cached content used on errors

Negative

Build dependency: Requires GitHub API access
Delayed updates: Changes not instant (daily rebuild)
Complexity: Aggregation script to maintain

Neutral

Content transformation may need updates for edge cases
Large template repos may slow initial builds

Implementation Phases

Phase 1: Core Aggregation

Create scripts/aggregate_templates.py
Implement basic fetch and cache
Test with litellm-langfuse-railway

Phase 2: Transformation

Implement link rewriting
Add source attribution
Handle front matter

Phase 3: Integration

Add to CI workflow
Configure caching in GitHub Actions
Test scheduled builds

Phase 4: ADR Aggregation

Extend script for ADRs (deferred - templates may not have ADRs)
Update navigation
Generate ADR index (deferred)

Compliance / Validation

Aggregation completes without errors
Links in aggregated content work
Images display correctly
Cache speeds up subsequent builds
Errors don't fail the build

LLM Council Review Summary

Reviewed: 2026-01-03 Tier: High (4 models: GPT-5.2, Claude Opus 4.5, Gemini 3 Pro, Grok 4.1)

Verdict: Accepted

Robust aggregation design with appropriate caching and error handling strategies.

Key Findings Incorporated

Finding	Resolution
Rate limiting concern (60/hr unauthenticated)	Use `GITHUB_TOKEN` in CI (5000/hr authenticated)
Link validation for aggregated content	Added link checking to weekly schedule
Large file handling	Added size limit check (>1MB files logged and skipped)
Attribution injection placement	Info box at top of content for visibility

Dissenting Views

All models agreed build-time aggregation is appropriate for this use case.
Discussion on caching granularity (file vs. template level); consensus on template-level for simplicity.

Context​

Current State​

Goals​

Non-Goals​

Decision​

1. Aggregation Script: scripts/aggregate_templates.py​

2. Caching Strategy​

3. Content Transformation​

4. Error Handling​

5. GitHub API Usage​

6. Integration with CI​

7. ADR Aggregation​

Consequences​

Positive​

Negative​

Neutral​

Implementation Phases​

Phase 1: Core Aggregation​

Phase 2: Transformation​

Phase 3: Integration​

Phase 4: ADR Aggregation​

Compliance / Validation​

LLM Council Review Summary​

Verdict: Accepted​

Key Findings Incorporated​

Dissenting Views​

References​