Building a Cross-Project ADR Aggregator with TDD

January 3, 2026 · 7 min read

Amiable Dev

AI Assistant

Architecture Decision Records (ADRs) are invaluable for documenting why technical decisions were made. But when you have multiple projects, those decisions become scattered across repositories. This post walks through how I built an automated aggregator to collect ADRs from all my projects into a unified Docusaurus documentation site, using TDD from start to finish.

The Problem

My portfolio includes several projects: Stentorosaur (a status monitoring plugin), LLM Council (multi-model AI deliberation), Luminescent Cluster (context management), and more. Each has its own ADRs documenting architectural decisions. The challenge: visitors had to navigate to each repository separately to understand my decision-making patterns.

I wanted a single place where recruiters could see consistent decision-making across projects, and engineers could discover architectural patterns that span multiple systems.

The Solution: Build-Time Aggregation

Following the pattern established in ADR-002 for the projects showcase, I created a prebuild script that:

Reads the list of projects from projects.json
Discovers ADR files using GitHub's Git Trees API
Fetches content via raw.githubusercontent.com
Transforms and writes to docs/adrs/projects/

The key insight was using the Git Trees API for discovery. A single API call returns the entire file tree for a repository (note: for very large repos with >100k objects, check data.truncated), which we can filter locally:

const { data } = await octokit.git.getTree({
  owner, repo, tree_sha: defaultBranch, recursive: 'true'
});

const adrFiles = data.tree.filter(file =>
  file.type === 'blob' &&
  file.path.match(/^(docs\/adrs?|adr)\/ADR-\d+.*\.md$/i)
);

This is far more efficient than making multiple API calls to check if different ADR directories exist.

Important caveats:

Set GITHUB_TOKEN for authenticated requests (5,000/hour) vs. unauthenticated (60/hour)
The raw.githubusercontent.com approach only works for public repositories; private repos require using octokit.repos.getContent with auth tokens
For very large repos with >100k objects, check data.truncated to ensure you got the full tree

TDD Approach

I followed TDD with Vitest and MSW (Mock Service Worker) for API mocking. The key was making the external API calls testable through comprehensive mocks.

MSW Handlers

The MSW handlers intercept GitHub API calls and return controlled mock data:

// __tests__/mocks/github-api.ts
import { mockTreeResponses, mockADRContent } from './adr-data';

http.get('https://api.github.com/repos/:owner/:repo/git/trees/:sha',
  ({ params, request }) => {
    const repoKey = `${params.owner}/${params.repo}`;

    // Support conditional requests (ETag caching)
    const ifNoneMatch = request.headers.get('If-None-Match');
    if (ifNoneMatch && mockTreeResponses[repoKey]?.etag === ifNoneMatch) {
      return new HttpResponse(null, { status: 304 });
    }

    return HttpResponse.json(mockTreeResponses[repoKey]?.data, {
      headers: { 'ETag': mockTreeResponses[repoKey]?.etag || '' },
    });
  }
),

// Raw content handler
http.get('https://raw.githubusercontent.com/:owner/:repo/:sha/*',
  ({ params, request }) => {
    const repoKey = `${params.owner}/${params.repo}`;
    const filePath = new URL(request.url).pathname.split('/').slice(4).join('/');
    const content = mockADRContent[repoKey]?.[filePath];

    if (!content) return new HttpResponse(null, { status: 404 });
    return new HttpResponse(content, {
      headers: { 'Content-Type': 'text/plain' },
    });
  }
),

Write Tests First (RED)

Each function got its tests before implementation:

it('should inject aggregator metadata while preserving source fields', async () => {
  const { injectFrontMatter } = await import('../../scripts/fetch-adrs');

  const parsedADR = {
    frontMatter: { title: 'ADR-001: Test', date: '2025-01-01' },
    body: '# Content',
  };

  const result = injectFrontMatter(parsedADR, project, sourceInfo);

  expect(result.frontMatter.slug).toBe('/docs/adrs/projects/stentorosaur/adr-001');
  expect(result.frontMatter.source_repo).toBe('amiable-dev/stentorosaur');
  expect(result.frontMatter.title).toBe('ADR-001: Test'); // Preserved from source
});

Running this test immediately failed because fetch-adrs.js didn't exist yet.

Implement to Pass (GREEN)

Then I wrote the minimal implementation to make tests pass, followed by refactoring as needed.

Key Implementation Details

Link Rewriting

ADRs often contain relative links that break when aggregated. Using regex-based rewriting (simple and sufficient for markdown):

Before:

![Architecture](./assets/diagram.png)
[See ADR-002](./ADR-002-notifications.md)

After:

![Architecture](https://raw.githubusercontent.com/owner/repo/abc123/docs/adrs/assets/diagram.png)
[See ADR-002](/docs/adrs/projects/stentorosaur/adr-002-notifications)

The implementation:

// Images: relative → absolute GitHub URLs
rewritten = content.replace(
  /!\[([^\]]*)\]\(\.\/([^)]+)\)/g,
  (match, alt, relativePath) => {
    const absoluteUrl = `https://raw.githubusercontent.com/${repo}/${sha}/${baseDir}/${relativePath}`;
    return `![${alt}](${absoluteUrl})`;
  }
);

// ADR references: relative → local Docusaurus paths
rewritten = rewritten.replace(
  /\[([^\]]+)\]\(\.\/([^)]+\.md)\)/g,
  (match, text, relativePath) => {
    const filename = path.basename(relativePath, '.md').toLowerCase();
    return `[${text}](/docs/adrs/projects/${repoName}/${filename})`;
  }
);

Cache Invalidation

To avoid redundant fetches, the script maintains a cache manifest tracking commit SHAs:

{
  "amiable-dev/stentorosaur": {
    "commitSha": "abc123",
    "fetchedAt": "2026-01-03T12:00:00Z",
    "files": ["adr-001.md", "adr-002.md"]
  }
}

If the commit SHA hasn't changed, we skip fetching that repo entirely.

MDX Compatibility

Aggregated markdown often contains JSX-like syntax (e.g., <4 developers) that breaks MDX parsing. Since this is a Docusaurus site, I add format: md to the front matter to tell Docusaurus to treat files as plain markdown:

const newFrontMatter = {
  format: 'md', // Docusaurus: treat as plain markdown, not MDX
  slug: generateSlug(project.repo, filename),
  // ...
};

Graceful Degradation

The build should never fail due to GitHub API issues:

async function main() {
  try {
    // ... fetch and transform ADRs
  } catch (error) {
    console.error('Error during ADR aggregation:', error.message);
    console.log('Continuing with partial data...');
    // Don't throw - build continues
  }
}

Results

The final implementation:

165 tests all passing (expanded with v2 enhancements)
44 ADRs aggregated from 3 projects
~2 seconds for cached builds
Zero build failures due to API issues

The ADRs are now browsable at /docs/adrs/projects/ with automatic tagging by project and technology. Cross-project patterns are finally visible in one place, and project cards link directly to their architecture documentation.

Lessons Learned

Git Trees API is your friend: A single API call beats multiple directory existence checks.
Raw fetches avoid REST API quota drain: Fetching via raw.githubusercontent.com doesn't consume your REST API rate limit, though it has its own bandwidth throttling.
format: md prevents MDX headaches (Docusaurus-specific): Aggregated markdown often contains characters that break MDX parsing.
Cache by commit SHA: ETags work, but commit SHAs are more reliable for content invalidation since they directly represent content state.
TDD with MSW scales well: Even for complex API interactions, well-structured mocks make testing straightforward. The key is mirroring real API response structures.

v2 Enhancements

After the initial implementation, I added three enhancements based on real-world usage:

1. Template Exclusion

ADR-000 files and templates were cluttering the aggregated output. Now they're excluded by default:

const DEFAULT_EXCLUSION_PATTERNS = [
  /^ADR-000/i,  // Standard template number
  /template/i,  // Files with "template" in name
  /^0000-/,     // Alternative template prefix
];

function shouldExcludeFile(filename, adrConfig = {}) {
  if (adrConfig.includeTemplates) return false;
  const patterns = adrConfig.excludePatterns || DEFAULT_EXCLUSION_PATTERNS;
  return patterns.some(p => p.test(filename));
}

Per-project overrides are supported via adrConfig in projects.json.

2. Cross-Linking with Projects Page

Projects now show ADR count badges linking directly to their architecture docs:

{project.adrCount > 0 && (
  <a href={`/docs/category/${slugifyTitle(project.title)}`} className={styles.adrBadge}>
    {project.adrCount} {project.adrCount === 1 ? 'ADR' : 'ADRs'}
  </a>
)}

The data flows through adr-summary.json, generated by fetch-adrs.js and consumed by fetch-projects.js.

3. Status Parsing

ADR status is now extracted and normalized for future dashboard features:

const STATUS_MAPPINGS = {
  proposed: ['proposed', 'draft', 'review'],
  accepted: ['accepted', 'approved', 'active'],
  deprecated: ['deprecated', 'superseded', 'obsolete'],
};

function extractStatusFromContent(content) {
  // Try ## Status section first
  const sectionMatch = content.match(/^##\s*Status\s*\n+([^\n#]+)/mi);
  if (sectionMatch) return sectionMatch[1].trim();

  // Fall back to front matter
  // ...
}

This enables status distribution analytics across the portfolio.

What's Next

The aggregator is live with v2 enhancements. Future improvements might include:

Dashboard index with status distribution charts
Full-text search across all ADRs
Webhook-triggered rebuilds when source repos change

For now, I'm happy with a unified, cross-linked view of architectural decisions across my entire portfolio.

The complete implementation is in ADR-003 and the fetch-adrs.js source.

The Problem​

The Solution: Build-Time Aggregation​

TDD Approach​

MSW Handlers​

Write Tests First (RED)​

Implement to Pass (GREEN)​

Key Implementation Details​

Link Rewriting​

Cache Invalidation​

MDX Compatibility​

Graceful Degradation​

Results​

Lessons Learned​

v2 Enhancements​

1. Template Exclusion​

2. Cross-Linking with Projects Page​

3. Status Parsing​

What's Next​