ADR-006: Cross-Project Documentation Aggregation
Status: Accepted 2026-01-03 Date: 2026-01-03 Decision Makers: @amiable-dev/maintainers Depends On: ADR-003 (Configuration), ADR-004 (CI/CD) Council Review: 2026-01-03 (Tier: High, Models: GPT-5.2, Claude Opus 4.5, Gemini 3 Pro, Grok 4.1)
Context
The amiable-templates site aggregates documentation from multiple template repositories. Each template repository contains its own documentation that should be pulled into the unified site. We need a system that:
- Fetches documentation from configured repositories at build time
- Transforms content for the unified site context
- Handles caching efficiently for fast builds
- Gracefully handles missing or changed content
This ADR adapts patterns from amiable-docusaurus ADR-003: Cross-Project ADR Aggregation.
Current State
No aggregation system exists. Template documentation is manually referenced via links.
Goals
- Automated fetching of template documentation at build time
- Consistent presentation across all template docs
- Source attribution for aggregated content
- Fast incremental builds via caching
- Resilient to upstream changes
Non-Goals
- Real-time synchronization (build-time is sufficient)
- Editing aggregated content in this repository
- Version history of aggregated content
- Webhook-triggered updates
Decision
Implement build-time documentation aggregation using Python scripts:
1. Aggregation Script: scripts/aggregate_templates.py
#!/usr/bin/env python3
"""
Fetch documentation from template repositories at build time.
Reads configuration from templates.yaml.
"""
import asyncio
import json
import os
import re
from pathlib import Path
from datetime import datetime
import aiohttp
import yaml
GITHUB_RAW_BASE = "https://raw.githubusercontent.com"
GITHUB_API_BASE = "https://api.github.com"
class TemplateAggregator:
def __init__(self, config_path: str = "templates.yaml"):
self.config = self._load_config(config_path)
self.cache_dir = Path(".cache/templates")
self.output_dir = Path("docs/templates")
self.token = os.environ.get("GITHUB_TOKEN")
async def aggregate_all(self):
"""Fetch docs from all configured templates."""
async with aiohttp.ClientSession() as session:
for template in self.config.get("templates", []):
await self.aggregate_template(session, template)
async def aggregate_template(self, session, template):
"""Fetch and transform docs for a single template."""
template_id = template["id"]
owner = template["repo"]["owner"]
repo = template["repo"]["name"]
# Get current commit SHA
sha = await self._get_commit_sha(session, owner, repo)
if not sha:
print(f" Warning: Could not fetch SHA for {owner}/{repo}")
return
# Check cache
if self._is_cached(template_id, sha):
print(f" Using cached content for {template_id}")
return
# Fetch each doc
output_path = self.output_dir / template_id
output_path.mkdir(parents=True, exist_ok=True)
for doc in template["directories"]["docs"]:
content = await self._fetch_file(
session, owner, repo, sha, doc["path"]
)
if content:
transformed = self._transform_content(
content, owner, repo, sha, doc["path"]
)
target = output_path / doc["target"]
target.write_text(transformed)
print(f" Wrote {doc['target']}")
self._update_cache(template_id, sha)
def _transform_content(self, content, owner, repo, sha, path):
"""Transform content for unified site."""
# Rewrite relative links to absolute GitHub URLs
content = self._rewrite_links(content, owner, repo, sha, path)
# Inject source attribution
source_url = f"https://github.com/{owner}/{repo}/blob/{sha}/{path}"
attribution = f"""
!!! info "Source"
This documentation is from [{owner}/{repo}]({source_url}).
Last synced: {datetime.utcnow().strftime('%Y-%m-%d')}
"""
return attribution + content
def _rewrite_links(self, content, owner, repo, sha, path):
"""Rewrite relative links to absolute GitHub URLs."""
base_path = Path(path).parent
# Rewrite images
def replace_image(match):
alt, src = match.groups()
if src.startswith(('http://', 'https://')):
return match.group(0)
resolved = (base_path / src).as_posix()
return f""
content = re.sub(r'!\[([^\]]*)\]\(([^)]+)\)', replace_image, content)
return content
2. Caching Strategy
Cache structure:
.cache/templates/
├── manifest.json # Tracks commit SHAs
└── raw/ # Cached raw content
└── {template-id}/
Manifest format:
{
"litellm-langfuse-starter": {
"commit_sha": "abc123...",
"fetched_at": "2026-01-03T10:00:00Z",
"files": ["overview.md", "setup.md"]
}
}
Cache invalidation:
- Compare current commit SHA with cached SHA
- If different, refetch all docs for that template
- Daily scheduled builds ensure freshness
3. Content Transformation
| Transformation | Purpose |
|---|---|
| Link rewriting | Relative → absolute GitHub URLs |
| Image rewriting | Point to raw.githubusercontent.com |
| Source attribution | Add info box with source link |
| Front matter | Inject MkDocs metadata |
4. Error Handling
| Scenario | Behavior |
|---|---|
| Template repo not found | Skip, log warning |
| Doc file not found | Skip file, continue others |
| Rate limit hit | Use cached content, warn |
| API error | Use cached content if available |
Never fail the build due to upstream issues.
5. GitHub API Usage
- Authenticated via
GITHUB_TOKEN(5000 req/hr) - Use raw.githubusercontent.com for content (no rate limit)
- Use Trees API for efficient directory listing
- Cache reduces API calls significantly
6. Integration with CI
In .github/workflows/deploy.yml:
- name: Aggregate template docs
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python scripts/aggregate_templates.py
- name: Build MkDocs
run: mkdocs build --strict
7. ADR Aggregation
ADRs are fetched similarly:
- Read from
docs/adr/in each template repo - Write to
docs/adrs/aggregated/{template-id}/ - Include in navigation under ADRs section
Consequences
Positive
- Unified experience: All template docs in one place
- Always fresh: Daily rebuilds catch upstream changes
- Fast builds: Caching minimizes fetch time
- Resilient: Cached content used on errors
Negative
- Build dependency: Requires GitHub API access
- Delayed updates: Changes not instant (daily rebuild)
- Complexity: Aggregation script to maintain
Neutral
- Content transformation may need updates for edge cases
- Large template repos may slow initial builds
Implementation Phases
Phase 1: Core Aggregation
- Create
scripts/aggregate_templates.py - Implement basic fetch and cache
- Test with litellm-langfuse-railway
Phase 2: Transformation
- Implement link rewriting
- Add source attribution
- Handle front matter
Phase 3: Integration
- Add to CI workflow
- Configure caching in GitHub Actions
- Test scheduled builds
Phase 4: ADR Aggregation
- Extend script for ADRs (deferred - templates may not have ADRs)
- Update navigation
- Generate ADR index (deferred)
Compliance / Validation
- Aggregation completes without errors
- Links in aggregated content work
- Images display correctly
- Cache speeds up subsequent builds
- Errors don't fail the build
LLM Council Review Summary
Reviewed: 2026-01-03 Tier: High (4 models: GPT-5.2, Claude Opus 4.5, Gemini 3 Pro, Grok 4.1)
Verdict: Accepted
Robust aggregation design with appropriate caching and error handling strategies.
Key Findings Incorporated
| Finding | Resolution |
|---|---|
| Rate limiting concern (60/hr unauthenticated) | Use GITHUB_TOKEN in CI (5000/hr authenticated) |
| Link validation for aggregated content | Added link checking to weekly schedule |
| Large file handling | Added size limit check (>1MB files logged and skipped) |
| Attribution injection placement | Info box at top of content for visibility |
Dissenting Views
- All models agreed build-time aggregation is appropriate for this use case.
- Discussion on caching granularity (file vs. template level); consensus on template-level for simplicity.