ADR-004: Remote Blog Post Aggregation
Status
Accepted
Date
2026-01-03
Context
With the successful implementation of ADR-003 (Cross-Project ADR Aggregation), amiable.dev now aggregates Architecture Decision Records from all portfolio projects. This establishes a pattern for pulling content from tracked GitHub repositories at build time.
A new requirement has emerged: aggregate blog posts from tracked projects to provide a unified content experience. Users should be able to discover not just ADRs but also blog content from the projects showcased on the /projects page.
Prior Art
- ADR-001: Enabled docs plugin at
/docsfor architecture documentation - ADR-002: Established
projects.jsonas the configuration source for tracked repositories - ADR-003: Implemented build-time ADR aggregation with:
- GitHub Git Trees API for discovery
- Raw GitHub content fetching (no rate limits)
- Front matter injection with source attribution
- Relative link rewriting
- Cache with commit SHA invalidation
- Template/boilerplate exclusion
- Status parsing and normalization
Available Solutions
Option A: docusaurus-plugin-remote-content
The @rdilweb/docusaurus-plugin-remote-content plugin provides:
Capabilities:
- Two sync modes: Constant (auto during build) or CLI (manual)
- Configurable source URLs and output directories
- Content transformation via
modifyContentcallback - Separate instances for docs vs images
- Axios-based HTTP fetching
Limitations:
- Requires explicit document list (no auto-discovery)
- No built-in caching or SHA-based invalidation
- No relative link rewriting
- Would require multiple plugin instances (one per project)
- Less control over front matter injection
- Active but minimal maintenance (last release 2023)
When to reconsider: If requirements evolve to explicit file lists (no discovery needed) and link rewriting becomes unnecessary, the plugin could reduce maintenance burden.
Option B: Extend Existing fetch-adrs.js Pattern
Leverage the proven architecture from ADR-003:
Advantages:
- Consistent codebase and patterns
- Full control over content transformation
- Existing cache infrastructure
- Single source of project configuration (
projects.json) - Unified testing approach with Vitest/MSW
Considerations:
- Additional custom code to maintain
- Blog posts have different structure than ADRs
Option C: Create Unified Content Aggregation System
Refactor into a generalized scripts/fetch-remote-content.js that handles both ADRs and blog posts:
Advantages:
- Single, well-tested content aggregation pipeline
- Shared utilities (caching, link rewriting, front matter)
- Easier to add future content types (e.g., docs, tutorials)
- DRY principle
Considerations:
- Larger refactoring effort
- Risk of breaking existing ADR functionality
Decision
Option B: Extend the existing fetch-adrs.js pattern to create a parallel scripts/fetch-blog-posts.js script.
Rationale
- Proven Pattern: ADR-003 established a reliable, well-tested approach with 165+ tests
- Content Discovery: Blog posts need auto-discovery (like ADRs), not explicit document lists
- Link Rewriting: Critical for cross-references and images, not supported by the plugin
- Caching: SHA-based invalidation prevents redundant fetches
- Front Matter: Full control over attribution, tags, and metadata injection
- Independence: Keeps blog aggregation separate, reducing risk to ADR functionality
- Plugin Overhead: Adding the plugin introduces a dependency for functionality we already have
Code Organization
To prevent logic drift between fetch-adrs.js and fetch-blog-posts.js, share core modules:
lib/
githubFetch.js # Fetch + retries + rate limit handling
cacheManifest.js # SHA-based caching utilities
rewriteLinks.js # Link rewriting for images/cross-refs
frontMatter.js # Front matter parsing and injection
Separate entrypoints allow independent iteration while shared core logic ensures consistency.
Future Consideration (ADR-005 Triggers)
Evaluate Option C (unified system) when:
- A third content type is needed (e.g., docs, tutorials)
- More than 50% of code is duplicated between scripts
- Common bugs appear in both scripts simultaneously
Implementation
Blog Post Discovery
Directory Scanning (in order of precedence):
blog/- Standard Docusaurus blog locationcontent/blog/- Alternative content directoryposts/- Common blog directory
First match wins; subsequent directories are not scanned if earlier ones exist.
Supported File Extensions:
.md(Markdown).mdx(MDX with React components)
File Patterns:
YYYY-MM-DD-*.md(x)- Date-prefixed posts*/index.md(x)- Folder-based posts (date from front matter or Git)*.md(x)- Non-dated posts (date from front matter or Git)
Depth: Scan nested directories recursively (e.g., blog/2024/january/post.md).
Draft Handling: Posts with draft: true in front matter are excluded from aggregation.
Date Extraction Strategy
Date is critical for Docusaurus blog ordering. Extract in this order:
- Front matter
datefield - Highest priority - Filename pattern - Extract from
YYYY-MM-DD-prefix - Git commit date - Fallback using GitHub API to get file's first commit date
- Fail gracefully - Log warning and skip file if no date can be determined
function extractDate(filename, frontMatter, gitCommitDate) {
if (frontMatter.date) return frontMatter.date;
const match = filename.match(/^(\d{4}-\d{2}-\d{2})/);
if (match) return match[1];
if (gitCommitDate) return gitCommitDate;
return null; // Skip file with warning
}
Collision Handling
Slug Collisions: Two repos may have 2024-01-01-update.md.
Strategy:
- Output path
blog/projects/{repo-name}/provides natural namespacing - If source post has explicit
slugfront matter, prefix with repo name:{repo-name}-{slug} - Log warning when collision is detected and resolved
Configuration Enhancement
Add optional blog configuration to project entries:
{
"repo": "amiable-dev/llm-council",
"includeBlogPosts": true,
"blogPath": "blog/",
"blogConfig": {
"includeDrafts": false,
"tagPrefix": "llm-council"
}
}
Projects without includeBlogPosts: true will not have their blog posts aggregated.
Output Structure
blog/
projects/
{repo-name}/
YYYY-MM-DD-post-slug.md
...
Front Matter Injection
Inject full author object inline (avoids modifying global authors.yml):
---
title: Original Title
date: 2026-01-01
source_repo: amiable-dev/llm-council
source_url: https://github.com/amiable-dev/llm-council/blob/main/blog/2026-01-01-post.md
aggregated_at: 2026-01-03T12:00:00Z
tags: [llm-council, aggregated]
authors:
- name: LLM Council Project
url: https://github.com/amiable-dev/llm-council
image_url: https://github.com/amiable-dev.png
---
Tag Handling:
- Preserve original tags from source post
- Add project-prefixed tag:
{repo-name} - Add
aggregatedtag for filtering
Link Rewriting
Same patterns as ADR-003:
- Relative images → absolute GitHub raw URLs (
raw.githubusercontent.com) - Internal blog links within same repo → local paths if both posts aggregated
- Other internal links → GitHub blob URLs
- External links → preserved as-is
javascript:or other suspicious protocols → stripped with warning
Cache Strategy
Create parallel .cache/blog/manifest.json with script versioning:
{
"schemaVersion": 1,
"scriptVersion": "1.0.0",
"repos": {
"amiable-dev/llm-council": {
"sha": "abc123",
"lastFetch": "2026-01-03T12:00:00Z",
"posts": ["2026-01-01-intro.md", "2026-01-15-update.md"]
}
}
}
Cache Invalidation:
- SHA change in repo → re-fetch all posts
scriptVersionchange → bust entire cache (handles logic changes)- Manual:
--no-cacheflag to force re-fetch
Prebuild Integration
{
"prebuild": "node scripts/fetch-adrs.js && node scripts/fetch-blog-posts.js && node scripts/fetch-projects.js"
}
Security Model
Repo Allowlist: Only aggregate from repos explicitly listed in projects.json with includeBlogPosts: true.
Content Validation:
- Sanitize front matter with
js-yamlsafeLoad - Strip suspicious link protocols (
javascript:,data:,vbscript:) - Validate paths to prevent directory traversal (
../) - No server-side code execution from fetched content
Rate Limiting:
- Use
GITHUB_TOKENfor authenticated requests (5000/hr vs 60/hr) - Use Git Trees API (single call per repo) for discovery
- Raw content via
raw.githubusercontent.com(no rate limit) - Implement retry with exponential backoff
Performance Expectations
| Metric | Expected Value |
|---|---|
| Repos to scan | 5-10 (portfolio projects) |
| Posts per repo | 0-20 (most projects have few/no blog posts) |
| API calls per build | ~2-3 per repo (tree + content fetches) |
| Build time impact | Less than 30 seconds added (with caching) |
| Initial fetch (cold) | ~2-3 minutes for all repos |
Failure Handling: Non-blocking. Log warnings and continue build if individual repos fail. Never break the build due to GitHub API issues.
Consequences
Positive
- Unified content experience across portfolio projects
- Consistent patterns with ADR aggregation
- Full control over content transformation
- Efficient caching reduces API calls
- No new external dependencies
- Inline authors avoid global file modifications
Negative
- Additional custom code to maintain
- Blog posts from projects may have inconsistent formatting
- Potential for content conflicts (duplicate slugs)
- Temporary code duplication with fetch-adrs.js until potential ADR-005 unification
- We own edge cases: MDX parsing, date normalization, collision handling
Neutral
- docusaurus-plugin-remote-content remains available for future use cases
- Separate scripts allow independent iteration on ADRs vs blog posts
Alternatives Considered
Use docusaurus-plugin-remote-content
Rejected because:
- No auto-discovery of blog posts (requires explicit file lists)
- No relative link rewriting (breaks images and cross-references)
- Would require significant configuration per project
- Less control over front matter transformation
- Introduces external dependency for functionality we already have
Refactor Everything into Unified System
Deferred because:
- Higher risk to stable ADR functionality
- Premature optimization without knowing blog-specific requirements
- Can be revisited in ADR-005 after blog aggregation is proven
- Allows commonalities to emerge naturally before abstraction
Modify Global authors.yml
Rejected because:
- Modifying source-controlled files during builds creates dirty git states
- Causes merge conflicts and CI complications
- Inline author injection in front matter is cleaner and self-contained
LLM Council Review
This ADR was reviewed by the LLM Council (Reasoning tier) on 2026-01-03. Key feedback incorporated:
- Author Strategy - Changed from dynamic
authors.ymlmodification to inline front matter injection - Date Extraction - Added explicit fallback strategy with Git commit date
- Discovery Rules - Added precedence, MDX support, draft handling, depth specification
- Cache Versioning - Added
scriptVersionto manifest to handle logic changes - Collision Handling - Added explicit slug collision resolution strategy
- Shared Modules - Added recommendation to share core logic via
lib/modules - Security Model - Added explicit allowlist, validation, and rate limiting details
- ADR-005 Triggers - Added specific conditions for evaluating unified system