Building a Cross-Project ADR Aggregator with TDD
Architecture Decision Records (ADRs) are invaluable for documenting why technical decisions were made. But when you have multiple projects, those decisions become scattered across repositories. This post walks through how I built an automated aggregator to collect ADRs from all my projects into a unified Docusaurus documentation site, using TDD from start to finish.
The Problem
My portfolio includes several projects: Stentorosaur (a status monitoring plugin), LLM Council (multi-model AI deliberation), Luminescent Cluster (context management), and more. Each has its own ADRs documenting architectural decisions. The challenge: visitors had to navigate to each repository separately to understand my decision-making patterns.
I wanted a single place where recruiters could see consistent decision-making across projects, and engineers could discover architectural patterns that span multiple systems.
The Solution: Build-Time Aggregation
Following the pattern established in ADR-002 for the projects showcase, I created a prebuild script that:
- Reads the list of projects from
projects.json - Discovers ADR files using GitHub's Git Trees API
- Fetches content via raw.githubusercontent.com
- Transforms and writes to
docs/adrs/projects/
The key insight was using the Git Trees API for discovery. A single API call returns the entire file tree for a repository (note: for very large repos with >100k objects, check data.truncated), which we can filter locally:
const { data } = await octokit.git.getTree({
owner, repo, tree_sha: defaultBranch, recursive: 'true'
});
const adrFiles = data.tree.filter(file =>
file.type === 'blob' &&
file.path.match(/^(docs\/adrs?|adr)\/ADR-\d+.*\.md$/i)
);
This is far more efficient than making multiple API calls to check if different ADR directories exist.
Important caveats:
- Set
GITHUB_TOKENfor authenticated requests (5,000/hour) vs. unauthenticated (60/hour) - The
raw.githubusercontent.comapproach only works for public repositories; private repos require usingoctokit.repos.getContentwith auth tokens - For very large repos with >100k objects, check
data.truncatedto ensure you got the full tree
TDD Approach
I followed TDD with Vitest and MSW (Mock Service Worker) for API mocking. The key was making the external API calls testable through comprehensive mocks.
MSW Handlers
The MSW handlers intercept GitHub API calls and return controlled mock data:
// __tests__/mocks/github-api.ts
import { mockTreeResponses, mockADRContent } from './adr-data';
http.get('https://api.github.com/repos/:owner/:repo/git/trees/:sha',
({ params, request }) => {
const repoKey = `${params.owner}/${params.repo}`;
// Support conditional requests (ETag caching)
const ifNoneMatch = request.headers.get('If-None-Match');
if (ifNoneMatch && mockTreeResponses[repoKey]?.etag === ifNoneMatch) {
return new HttpResponse(null, { status: 304 });
}
return HttpResponse.json(mockTreeResponses[repoKey]?.data, {
headers: { 'ETag': mockTreeResponses[repoKey]?.etag || '' },
});
}
),
// Raw content handler
http.get('https://raw.githubusercontent.com/:owner/:repo/:sha/*',
({ params, request }) => {
const repoKey = `${params.owner}/${params.repo}`;
const filePath = new URL(request.url).pathname.split('/').slice(4).join('/');
const content = mockADRContent[repoKey]?.[filePath];
if (!content) return new HttpResponse(null, { status: 404 });
return new HttpResponse(content, {
headers: { 'Content-Type': 'text/plain' },
});
}
),
Write Tests First (RED)
Each function got its tests before implementation:
it('should inject aggregator metadata while preserving source fields', async () => {
const { injectFrontMatter } = await import('../../scripts/fetch-adrs');
const parsedADR = {
frontMatter: { title: 'ADR-001: Test', date: '2025-01-01' },
body: '# Content',
};
const result = injectFrontMatter(parsedADR, project, sourceInfo);
expect(result.frontMatter.slug).toBe('/docs/adrs/projects/stentorosaur/adr-001');
expect(result.frontMatter.source_repo).toBe('amiable-dev/stentorosaur');
expect(result.frontMatter.title).toBe('ADR-001: Test'); // Preserved from source
});
Running this test immediately failed because fetch-adrs.js didn't exist yet.
Implement to Pass (GREEN)
Then I wrote the minimal implementation to make tests pass, followed by refactoring as needed.
Key Implementation Details
Link Rewriting
ADRs often contain relative links that break when aggregated. Using regex-based rewriting (simple and sufficient for markdown):
Before:

[See ADR-002](./ADR-002-notifications.md)
After:

[See ADR-002](/docs/adrs/projects/stentorosaur/adr-002-notifications)
The implementation:
// Images: relative → absolute GitHub URLs
rewritten = content.replace(
/!\[([^\]]*)\]\(\.\/([^)]+)\)/g,
(match, alt, relativePath) => {
const absoluteUrl = `https://raw.githubusercontent.com/${repo}/${sha}/${baseDir}/${relativePath}`;
return ``;
}
);
// ADR references: relative → local Docusaurus paths
rewritten = rewritten.replace(
/\[([^\]]+)\]\(\.\/([^)]+\.md)\)/g,
(match, text, relativePath) => {
const filename = path.basename(relativePath, '.md').toLowerCase();
return `[${text}](/docs/adrs/projects/${repoName}/${filename})`;
}
);
Cache Invalidation
To avoid redundant fetches, the script maintains a cache manifest tracking commit SHAs:
{
"amiable-dev/stentorosaur": {
"commitSha": "abc123",
"fetchedAt": "2026-01-03T12:00:00Z",
"files": ["adr-001.md", "adr-002.md"]
}
}
If the commit SHA hasn't changed, we skip fetching that repo entirely.
MDX Compatibility
Aggregated markdown often contains JSX-like syntax (e.g., <4 developers) that breaks MDX parsing. Since this is a Docusaurus site, I add format: md to the front matter to tell Docusaurus to treat files as plain markdown:
const newFrontMatter = {
format: 'md', // Docusaurus: treat as plain markdown, not MDX
slug: generateSlug(project.repo, filename),
// ...
};
Graceful Degradation
The build should never fail due to GitHub API issues:
async function main() {
try {
// ... fetch and transform ADRs
} catch (error) {
console.error('Error during ADR aggregation:', error.message);
console.log('Continuing with partial data...');
// Don't throw - build continues
}
}
Results
The final implementation:
- 165 tests all passing (expanded with v2 enhancements)
- 44 ADRs aggregated from 3 projects
- ~2 seconds for cached builds
- Zero build failures due to API issues
The ADRs are now browsable at /docs/adrs/projects/ with automatic tagging by project and technology. Cross-project patterns are finally visible in one place, and project cards link directly to their architecture documentation.
Lessons Learned
-
Git Trees API is your friend: A single API call beats multiple directory existence checks.
-
Raw fetches avoid REST API quota drain: Fetching via
raw.githubusercontent.comdoesn't consume your REST API rate limit, though it has its own bandwidth throttling. -
format: mdprevents MDX headaches (Docusaurus-specific): Aggregated markdown often contains characters that break MDX parsing. -
Cache by commit SHA: ETags work, but commit SHAs are more reliable for content invalidation since they directly represent content state.
-
TDD with MSW scales well: Even for complex API interactions, well-structured mocks make testing straightforward. The key is mirroring real API response structures.
v2 Enhancements
After the initial implementation, I added three enhancements based on real-world usage:
1. Template Exclusion
ADR-000 files and templates were cluttering the aggregated output. Now they're excluded by default:
const DEFAULT_EXCLUSION_PATTERNS = [
/^ADR-000/i, // Standard template number
/template/i, // Files with "template" in name
/^0000-/, // Alternative template prefix
];
function shouldExcludeFile(filename, adrConfig = {}) {
if (adrConfig.includeTemplates) return false;
const patterns = adrConfig.excludePatterns || DEFAULT_EXCLUSION_PATTERNS;
return patterns.some(p => p.test(filename));
}
Per-project overrides are supported via adrConfig in projects.json.
2. Cross-Linking with Projects Page
Projects now show ADR count badges linking directly to their architecture docs:
{project.adrCount > 0 && (
<a href={`/docs/category/${slugifyTitle(project.title)}`} className={styles.adrBadge}>
{project.adrCount} {project.adrCount === 1 ? 'ADR' : 'ADRs'}
</a>
)}
The data flows through adr-summary.json, generated by fetch-adrs.js and consumed by fetch-projects.js.
3. Status Parsing
ADR status is now extracted and normalized for future dashboard features:
const STATUS_MAPPINGS = {
proposed: ['proposed', 'draft', 'review'],
accepted: ['accepted', 'approved', 'active'],
deprecated: ['deprecated', 'superseded', 'obsolete'],
};
function extractStatusFromContent(content) {
// Try ## Status section first
const sectionMatch = content.match(/^##\s*Status\s*\n+([^\n#]+)/mi);
if (sectionMatch) return sectionMatch[1].trim();
// Fall back to front matter
// ...
}
This enables status distribution analytics across the portfolio.
What's Next
The aggregator is live with v2 enhancements. Future improvements might include:
- Dashboard index with status distribution charts
- Full-text search across all ADRs
- Webhook-triggered rebuilds when source repos change
For now, I'm happy with a unified, cross-linked view of architectural decisions across my entire portfolio.
The complete implementation is in ADR-003 and the fetch-adrs.js source.

