ADR-004: Remote Blog Post Aggregation
Status
Accepted
Date
2026-01-03
Context
With the successful implementation of ADR-003 (Cross-Project ADR Aggregation), amiable.dev now aggregates Architecture Decision Records from all portfolio projects. This establishes a pattern for pulling content from tracked GitHub repositories at build time.
A new requirement has emerged: aggregate blog posts from tracked projects to provide a unified content experience. Users should be able to discover not just ADRs but also blog content from the projects showcased on the /projects page.
Prior Art
- ADR-001: Enabled docs plugin at
/docsfor architecture documentation - ADR-002: Established
projects.jsonas the configuration source for tracked repositories - ADR-003: Implemented build-time ADR aggregation with:
- GitHub Git Trees API for discovery
- Raw GitHub content fetching (no rate limits)
- Front matter injection with source attribution
- Relative link rewriting
- Cache with commit SHA invalidation
- Template/boilerplate exclusion
- Status parsing and normalization
Available Solutions
Option A: docusaurus-plugin-remote-content
The @rdilweb/docusaurus-plugin-remote-content plugin provides:
Capabilities:
- Two sync modes: Constant (auto during build) or CLI (manual)
- Configurable source URLs and output directories
- Content transformation via
modifyContentcallback - Separate instances for docs vs images
- Axios-based HTTP fetching
Limitations:
- Requires explicit document list (no auto-discovery)
- No built-in caching or SHA-based invalidation
- No relative link rewriting
- Would require multiple plugin instances (one per project)
- Less control over front matter injection
- Active but minimal maintenance (last release 2023)
When to reconsider: If requirements evolve to explicit file lists (no discovery needed) and link rewriting becomes unnecessary, the plugin could reduce maintenance burden.
Option B: Extend Existing fetch-adrs.js Pattern
Leverage the proven architecture from ADR-003:
Advantages:
- Consistent codebase and patterns
- Full control over content transformation
- Existing cache infrastructure
- Single source of project configuration (
projects.json) - Unified testing approach with Vitest/MSW
Considerations:
- Additional custom code to maintain
- Blog posts have different structure than ADRs
Option C: Create Unified Content Aggregation System
Refactor into a generalized scripts/fetch-remote-content.js that handles both ADRs and blog posts:
Advantages:
- Single, well-tested content aggregation pipeline
- Shared utilities (caching, link rewriting, front matter)
- Easier to add future content types (e.g., docs, tutorials)
- DRY principle
Considerations:
- Larger refactoring effort
- Risk of breaking existing ADR functionality
Decision
Option B: Extend the existing fetch-adrs.js pattern to create a parallel scripts/fetch-blog-posts.js script.
Rationale
- Proven Pattern: ADR-003 established a reliable, well-tested approach with 165+ tests
- Content Discovery: Blog posts need auto-discovery (like ADRs), not explicit document lists
- Link Rewriting: Critical for cross-references and images, not supported by the plugin
- Caching: SHA-based invalidation prevents redundant fetches
- Front Matter: Full control over attribution, tags, and metadata injection
- Independence: Keeps blog aggregation separate, reducing risk to ADR functionality
- Plugin Overhead: Adding the plugin introduces a dependency for functionality we already have
Code Organization
To prevent logic drift between fetch-adrs.js and fetch-blog-posts.js, share core modules:
lib/
githubFetch.js # Fetch + retries + rate limit handling
cacheManifest.js # SHA-based caching utilities
rewriteLinks.js # Link rewriting for images/cross-refs
frontMatter.js # Front matter parsing and injection
Separate entrypoints allow independent iteration while shared core logic ensures consistency.
Future Consideration (ADR-005 Triggers)
Evaluate Option C (unified system) when:
- A third content type is needed (e.g., docs, tutorials)
- More than 50% of code is duplicated between scripts
- Common bugs appear in both scripts simultaneously
Implementation
Blog Post Discovery
Directory Scanning (in order of precedence):
blog/- Standard Docusaurus blog locationcontent/blog/- Alternative content directoryposts/- Common blog directory
First match wins; subsequent directories are not scanned if earlier ones exist.
Supported File Extensions:
.md(Markdown).mdx(MDX with React components)
File Patterns:
YYYY-MM-DD-*.md(x)- Date-prefixed posts*/index.md(x)- Folder-based posts (date from front matter or Git)*.md(x)- Non-dated posts (date from front matter or Git)
Depth: Scan nested directories recursively (e.g., blog/2024/january/post.md).
Draft Handling: Posts with draft: true in front matter are excluded from aggregation.
Date Extraction Strategy
Date is critical for Docusaurus blog ordering. Extract in this order:
- Front matter
datefield - Highest priority - Filename pattern - Extract from
YYYY-MM-DD-prefix - Fetch timestamp - Use current date with warning logged
- Future enhancement - Git commit date via GitHub API (deferred due to API cost: 1 call per dateless file)
function extractDate(filename, frontMatter) {
if (frontMatter.date) return frontMatter.date;
const match = filename.match(/^(\d{4}-\d{2}-\d{2})/);
if (match) return match[1];
console.warn(`Warning: No date found for ${filename}, using fetch date`);
return new Date().toISOString().split('T')[0]; // YYYY-MM-DD
}
Collision Handling
Slug Collisions: Two repos may have 2024-01-01-update.md.
Strategy:
- Output path
blog/projects/{repo-name}/provides natural namespacing - If source post has explicit
slugfront matter, prefix with repo name:{repo-name}-{slug} - Log warning when collision is detected and resolved
Configuration Enhancement
Add optional blog configuration to project entries (consistent with adrConfig pattern):
{
"repo": "amiable-dev/llm-council",
"blogConfig": {
"enabled": true,
"directory": "blog/",
"includeDrafts": false,
"tagPrefix": "llm-council"
}
}
Projects without blogConfig.enabled: true will not have their blog posts aggregated. The enabled flag follows the same pattern as adrConfig.enabled for consistency.
Output Structure
blog/
projects/
{repo-name}/
YYYY-MM-DD-post-slug.md
...
Front Matter Schema
A standardized front matter contract enables consistent aggregation while allowing graceful degradation when fields are missing or malformed.
Docusaurus Standard Fields (Preserved)
These fields are defined by Docusaurus and passed through unchanged:
| Field | Type | Default | Description |
|---|---|---|---|
title | string | Markdown H1 or filename | Blog post heading |
title_meta | string | title | SEO title for <head> metadata |
description | string | First paragraph | Meta description for SEO |
slug | string | File path | Custom URL path |
date | datetime | Filename or fetch time | Publication date (YAML format) |
authors | array | Project default | Author references or inline objects |
tags | array | [] | Post categorization tags |
image | string | none | Cover image for social cards |
draft | boolean | false | Exclude from production |
unlisted | boolean | false | Hide from listings, allow direct access |
hide_table_of_contents | boolean | false | Suppress right-side TOC |
toc_min_heading_level | number | 2 | Minimum heading level in TOC |
toc_max_heading_level | number | 3 | Maximum heading level in TOC |
keywords | array | [] | SEO keywords meta tag |
last_update | object | none | Override last update metadata |
pagination_prev | string | auto | Custom previous link (ignored in aggregated context) |
pagination_next | string | auto | Custom next link (ignored in aggregated context) |
Content Markers: The <!--truncate--> marker is critical for blog excerpts. This marker MUST be preserved during aggregation to ensure homepage listings display excerpts rather than full posts.
Extended Fields for Aggregation
These custom fields enhance the aggregation experience:
| Field | Type | Default | Description |
|---|---|---|---|
sidebar_label | string | title | Shorter title for sidebar navigation |
sidebar_title | string | title | Alias for sidebar_label (source convenience, takes precedence) |
project_name | string | Inferred from repo | Override displayed project name |
category | string | none | Content type: tutorial, announcement, deep-dive, case-study, release |
series | object | none | Multi-part series metadata (see below) |
featured | boolean | false | Highlight in aggregated listings |
reading_time_override | number | Calculated | Manual reading time in minutes |
Series Metadata Schema
For multi-part content:
series:
name: "Building Arbiter Bot" # Series title
part: 2 # This post's position
total: 5 # Total parts (optional)
slug: "arbiter-bot-series" # URL-safe identifier (optional)
Note: Series are scoped per-repository. Cross-project series (posts spanning multiple repos) are not currently supported; each repo's series operates independently.
Aggregation Output Fields (Injected)
These fields are added by the aggregation script:
| Field | Type | Description |
|---|---|---|
format | string | Always md to disable MDX parsing |
source_repo | string | Original owner/repo |
source_url | string | GitHub blob URL to source file |
source_commit | string | Commit SHA at fetch time |
fetched_at | datetime | ISO timestamp of aggregation |
Front Matter Transformation Rules
-
Sidebar Label Generation (precedence order):
sidebar_label = "{project_name}: {sidebar_title || sidebar_label || title}"Where
sidebar_titletakes precedence oversidebar_labelif both are present. -
Tag Merging (with deduplication):
- Preserve all source tags
- Add repo name as tag (normalized to lowercase, hyphens)
- Add skill tags from
projects.json - Deduplicate after normalization (e.g., "TypeScript" and "typescript" become single "typescript")
-
Author Injection (when source has no authors):
authors:
- name: "{project.title}"
url: "https://github.com/{repo}" -
Date Extraction Priority:
- Front matter
datefield - Filename pattern
YYYY-MM-DD-* - Current date (with warning logged)
- Front matter
Graceful Degradation
The aggregation script MUST succeed even with malformed or incomplete front matter:
| Scenario | Behavior |
|---|---|
Missing title | Use filename (without extension/date prefix) |
Missing date | Extract from filename, else use fetch date with warning |
Invalid date format | Parse attempt with Date.parse(), fallback to fetch date |
Missing authors | Inject project-based default author |
Invalid series object | Ignore series metadata, log warning |
| Unknown fields | Pass through unchanged |
| YAML parse error | Skip file, log error, continue aggregation |
| Empty front matter | Use all defaults, proceed with aggregation |
| BOM (Byte Order Mark) | Strip BOM before parsing (content.replace(/^\uFEFF/, '')) |
Binary/non-text .md file | Detect via content inspection, skip with warning |
| Circular series references | Skip series metadata, log warning |
Example: Source Post
---
title: "Implementing the Actor Model"
sidebar_title: "Actor Model"
description: "How we built concurrent execution with message passing"
date: 2026-01-21
authors: [antigravity]
tags: [rust, concurrency, architecture]
category: deep-dive
series:
name: "Arbiter Bot Architecture"
part: 3
total: 5
featured: true
---
Example: Aggregated Output
---
format: md
slug: /blog/arbiter-bot/2026-01-21-implementing-the-actor-model
source_repo: amiable-dev/arbiter-bot
source_url: https://github.com/amiable-dev/arbiter-bot/blob/main/docs/blog/2026-01-21-implementing-the-actor-model.md
source_commit: abc123def456
fetched_at: '2026-01-23T12:00:00Z'
sidebar_label: 'Arbiter Bot: Actor Model'
title: "Implementing the Actor Model"
description: "How we built concurrent execution with message passing"
date: '2026-01-21'
authors: [antigravity]
tags:
- arbiter-bot
- rust
- concurrency
- architecture
- actor-model
- saga-pattern
- grpc
- telegram
category: deep-dive
series:
name: "Arbiter Bot Architecture"
part: 3
total: 5
featured: true
---
Project CLAUDE.md Guidance
Projects should include front matter guidance in their CLAUDE.md:
## Blog Post Front Matter
Blog posts support these front matter fields for aggregation to amiable.dev:
### Required
- `title`: Post title (displayed in listings and page header)
- `date`: Publication date (YYYY-MM-DD format)
### Recommended
- `description`: 1-2 sentence summary for SEO and previews
- `sidebar_title`: Shorter title for navigation (defaults to `title`)
- `authors`: Author key(s) from authors.yml or inline objects
- `tags`: Categorization tags (lowercase, hyphenated)
### Optional
- `category`: Content type (tutorial|announcement|deep-dive|case-study|release)
- `series`: For multi-part posts: `{name, part, total}`
- `featured`: Set `true` to highlight in aggregated listings
- `image`: Path to cover image for social sharing
- `draft`: Set `true` to exclude from production
### Example
\`\`\`yaml
---
title: "Building the Authentication System"
sidebar_title: "Auth System"
description: "Implementing OAuth2 with PKCE for secure API access"
date: 2026-01-15
authors: [your-author-key]
tags: [security, oauth, api]
category: deep-dive
---
Opening paragraph with hook.
\<!--truncate-->
Rest of the content...
\`\`\`
### Content Guidelines
- Place \`\<!--truncate-->\` marker after the first paragraph for homepage excerpts
- Use relative paths for images (\`./images/diagram.png\`) - they'll be converted to GitHub raw URLs
- Tags should be lowercase and hyphenated (e.g., \`clean-architecture\`, not \`Clean Architecture\`)
- Stick to standard Markdown - custom MDX components won't be available after aggregation
- Avoid \`import\` statements - they'll cause build failures in the aggregated context
### Authors
- Reference existing author keys from authors.yml: \`authors: [antigravity]\`
- Or use inline objects: \`authors: [{name: "Guest Author", url: "https://..."}]\`
### Common Pitfalls
- **YAML quoting**: Values with colons need quoting: \`title: "Fix: Authentication Bug"\`
- **Date format**: Use ISO format \`YYYY-MM-DD\`, not \`January 1st, 2026\`
- **Image paths**: Don't use absolute paths starting with \`/\` - use relative paths
- **MDX syntax**: Avoid JSX-like syntax (\`<Component />\`) as it won't render correctly
Tag Handling:
- Preserve original tags from source post
- Add project-prefixed tag:
{repo-name} - Add skill tags from project configuration
Link Rewriting
Same patterns as ADR-003:
- Relative images → absolute GitHub raw URLs (
raw.githubusercontent.com) - Internal blog links within same directory → local paths (same-directory heuristic)
- Other internal links → GitHub blob URLs
- External links → preserved as-is
- Suspicious protocols (
javascript:,data:,vbscript:,file:) → stripped with warning
Internal Link Strategy: Use the same-directory heuristic for simplicity. Links to files in the same blog directory are assumed to be aggregated together and rewritten to local paths. Links to files outside the blog directory are converted to GitHub blob URLs. This avoids the complexity of two-pass aggregation while handling the common case correctly.
Cache Strategy
Create parallel .cache/blog/manifest.json with script versioning:
{
"schemaVersion": 1,
"scriptVersion": "1.0.0",
"repos": {
"amiable-dev/llm-council": {
"sha": "abc123",
"lastFetch": "2026-01-03T12:00:00Z",
"posts": ["2026-01-01-intro.md", "2026-01-15-update.md"]
}
}
}
Cache Invalidation:
- SHA change in repo → re-fetch all posts
scriptVersionchange → bust entire cache (handles logic changes)- Manual:
--no-cacheflag to force re-fetch
scriptVersion Bump Triggers (increment when any of these change):
- Front matter transformation logic
- Date extraction algorithm
- Link rewriting rules
- Tag merging/normalization logic
- Output file path generation
Prebuild Integration
{
"prebuild": "node scripts/fetch-adrs.js && node scripts/fetch-blog-posts.js && node scripts/fetch-projects.js"
}
Security Model
Repo Allowlist: Only aggregate from repos explicitly listed in projects.json with blogConfig.enabled: true.
Private Repo Support: For feature parity with ADR-003's ADR aggregation:
- Detect private repos via GitHub API response (
repoData.private) - Use GitHub Contents API as fallback when Git Trees API is unavailable
- Requires
GITHUB_TOKENwith appropriate repo access permissions
Content Validation:
- Sanitize front matter with
js-yamlsafeLoad - Strip suspicious link protocols (
javascript:,data:,vbscript:) - Validate paths to prevent directory traversal (
../) - No server-side code execution from fetched content
Rate Limiting:
- Use
GITHUB_TOKENfor authenticated requests (5000/hr vs 60/hr) - Use Git Trees API (single call per repo) for discovery
- Raw content via
raw.githubusercontent.com(no rate limit) - Implement retry with exponential backoff
Performance Expectations
| Metric | Expected Value |
|---|---|
| Repos to scan | 5-10 (portfolio projects) |
| Posts per repo | 0-20 (most projects have few/no blog posts) |
| API calls per build | ~2-3 per repo (tree + content fetches) |
| Build time impact | Less than 30 seconds added (with caching) |
| Initial fetch (cold) | ~2-3 minutes for all repos |
Failure Handling: Non-blocking. Log warnings and continue build if individual repos fail. Never break the build due to GitHub API issues.
Consequences
Positive
- Unified content experience across portfolio projects
- Consistent patterns with ADR aggregation
- Full control over content transformation
- Efficient caching reduces API calls
- No new external dependencies
- Inline authors avoid global file modifications
Negative
- Additional custom code to maintain
- Blog posts from projects may have inconsistent formatting
- Potential for content conflicts (duplicate slugs)
- Temporary code duplication with fetch-adrs.js until potential ADR-005 unification
- We own edge cases: MDX parsing, date normalization, collision handling
Neutral
- docusaurus-plugin-remote-content remains available for future use cases
- Separate scripts allow independent iteration on ADRs vs blog posts
Alternatives Considered
Use docusaurus-plugin-remote-content
Rejected because:
- No auto-discovery of blog posts (requires explicit file lists)
- No relative link rewriting (breaks images and cross-references)
- Would require significant configuration per project
- Less control over front matter transformation
- Introduces external dependency for functionality we already have
Refactor Everything into Unified System
Deferred because:
- Higher risk to stable ADR functionality
- Premature optimization without knowing blog-specific requirements
- Can be revisited in ADR-005 after blog aggregation is proven
- Allows commonalities to emerge naturally before abstraction
Modify Global authors.yml
Rejected because:
- Modifying source-controlled files during builds creates dirty git states
- Causes merge conflicts and CI complications
- Inline author injection in front matter is cleaner and self-contained
LLM Council Review
Initial Review (2026-01-03)
This ADR was reviewed by the LLM Council (Reasoning tier). Key feedback incorporated:
- Author Strategy - Changed from dynamic
authors.ymlmodification to inline front matter injection - Date Extraction - Added explicit fallback strategy with Git commit date
- Discovery Rules - Added precedence, MDX support, draft handling, depth specification
- Cache Versioning - Added
scriptVersionto manifest to handle logic changes - Collision Handling - Added explicit slug collision resolution strategy
- Shared Modules - Added recommendation to share core logic via
lib/modules - Security Model - Added explicit allowlist, validation, and rate limiting details
- ADR-005 Triggers - Added specific conditions for evaluating unified system
Front Matter Schema Review (2026-01-23)
Reviewed by Claude Opus 4.5 (Architecture Specialist) at 92% confidence. Recommendation: Approve with Changes
Critical changes incorporated:
- Added
<!--truncate-->marker preservation documentation - Clarified internal blog link rewriting strategy (same-directory heuristic)
- Removed redundant
includeBlogPostsin favor ofblogConfig.enabledfor consistency
Important changes incorporated:
- Added private repo support for feature parity with ADR-003
- Added BOM stripping to graceful degradation scenarios
- Added
file:protocol to suspicious protocols list - Documented
scriptVersionbump triggers - Expanded CLAUDE.md template with truncate marker, MDX guidance, and common pitfalls
Minor items tracked for follow-up:
- Cross-project series support (documented as out of scope)
- Filename collision detection within same project (covered by existing slug collision handling)
- Git commit date fallback (documented as future enhancement due to API cost)
- Manifest schema alignment between ADR-003 and ADR-004 (parallel formats acceptable)