Skip to main content

ADR-004: Remote Blog Post Aggregation

Status

Accepted

Date

2026-01-03

Context

With the successful implementation of ADR-003 (Cross-Project ADR Aggregation), amiable.dev now aggregates Architecture Decision Records from all portfolio projects. This establishes a pattern for pulling content from tracked GitHub repositories at build time.

A new requirement has emerged: aggregate blog posts from tracked projects to provide a unified content experience. Users should be able to discover not just ADRs but also blog content from the projects showcased on the /projects page.

Prior Art

  1. ADR-001: Enabled docs plugin at /docs for architecture documentation
  2. ADR-002: Established projects.json as the configuration source for tracked repositories
  3. ADR-003: Implemented build-time ADR aggregation with:
    • GitHub Git Trees API for discovery
    • Raw GitHub content fetching (no rate limits)
    • Front matter injection with source attribution
    • Relative link rewriting
    • Cache with commit SHA invalidation
    • Template/boilerplate exclusion
    • Status parsing and normalization

Available Solutions

Option A: docusaurus-plugin-remote-content

The @rdilweb/docusaurus-plugin-remote-content plugin provides:

Capabilities:

  • Two sync modes: Constant (auto during build) or CLI (manual)
  • Configurable source URLs and output directories
  • Content transformation via modifyContent callback
  • Separate instances for docs vs images
  • Axios-based HTTP fetching

Limitations:

  • Requires explicit document list (no auto-discovery)
  • No built-in caching or SHA-based invalidation
  • No relative link rewriting
  • Would require multiple plugin instances (one per project)
  • Less control over front matter injection
  • Active but minimal maintenance (last release 2023)

When to reconsider: If requirements evolve to explicit file lists (no discovery needed) and link rewriting becomes unnecessary, the plugin could reduce maintenance burden.

Option B: Extend Existing fetch-adrs.js Pattern

Leverage the proven architecture from ADR-003:

Advantages:

  • Consistent codebase and patterns
  • Full control over content transformation
  • Existing cache infrastructure
  • Single source of project configuration (projects.json)
  • Unified testing approach with Vitest/MSW

Considerations:

  • Additional custom code to maintain
  • Blog posts have different structure than ADRs

Option C: Create Unified Content Aggregation System

Refactor into a generalized scripts/fetch-remote-content.js that handles both ADRs and blog posts:

Advantages:

  • Single, well-tested content aggregation pipeline
  • Shared utilities (caching, link rewriting, front matter)
  • Easier to add future content types (e.g., docs, tutorials)
  • DRY principle

Considerations:

  • Larger refactoring effort
  • Risk of breaking existing ADR functionality

Decision

Option B: Extend the existing fetch-adrs.js pattern to create a parallel scripts/fetch-blog-posts.js script.

Rationale

  1. Proven Pattern: ADR-003 established a reliable, well-tested approach with 165+ tests
  2. Content Discovery: Blog posts need auto-discovery (like ADRs), not explicit document lists
  3. Link Rewriting: Critical for cross-references and images, not supported by the plugin
  4. Caching: SHA-based invalidation prevents redundant fetches
  5. Front Matter: Full control over attribution, tags, and metadata injection
  6. Independence: Keeps blog aggregation separate, reducing risk to ADR functionality
  7. Plugin Overhead: Adding the plugin introduces a dependency for functionality we already have

Code Organization

To prevent logic drift between fetch-adrs.js and fetch-blog-posts.js, share core modules:

lib/
githubFetch.js # Fetch + retries + rate limit handling
cacheManifest.js # SHA-based caching utilities
rewriteLinks.js # Link rewriting for images/cross-refs
frontMatter.js # Front matter parsing and injection

Separate entrypoints allow independent iteration while shared core logic ensures consistency.

Future Consideration (ADR-005 Triggers)

Evaluate Option C (unified system) when:

  • A third content type is needed (e.g., docs, tutorials)
  • More than 50% of code is duplicated between scripts
  • Common bugs appear in both scripts simultaneously

Implementation

Blog Post Discovery

Directory Scanning (in order of precedence):

  1. blog/ - Standard Docusaurus blog location
  2. content/blog/ - Alternative content directory
  3. posts/ - Common blog directory

First match wins; subsequent directories are not scanned if earlier ones exist.

Supported File Extensions:

  • .md (Markdown)
  • .mdx (MDX with React components)

File Patterns:

  • YYYY-MM-DD-*.md(x) - Date-prefixed posts
  • */index.md(x) - Folder-based posts (date from front matter or Git)
  • *.md(x) - Non-dated posts (date from front matter or Git)

Depth: Scan nested directories recursively (e.g., blog/2024/january/post.md).

Draft Handling: Posts with draft: true in front matter are excluded from aggregation.

Date Extraction Strategy

Date is critical for Docusaurus blog ordering. Extract in this order:

  1. Front matter date field - Highest priority
  2. Filename pattern - Extract from YYYY-MM-DD- prefix
  3. Fetch timestamp - Use current date with warning logged
  4. Future enhancement - Git commit date via GitHub API (deferred due to API cost: 1 call per dateless file)
function extractDate(filename, frontMatter) {
if (frontMatter.date) return frontMatter.date;
const match = filename.match(/^(\d{4}-\d{2}-\d{2})/);
if (match) return match[1];
console.warn(`Warning: No date found for ${filename}, using fetch date`);
return new Date().toISOString().split('T')[0]; // YYYY-MM-DD
}

Collision Handling

Slug Collisions: Two repos may have 2024-01-01-update.md.

Strategy:

  1. Output path blog/projects/{repo-name}/ provides natural namespacing
  2. If source post has explicit slug front matter, prefix with repo name: {repo-name}-{slug}
  3. Log warning when collision is detected and resolved

Configuration Enhancement

Add optional blog configuration to project entries (consistent with adrConfig pattern):

{
"repo": "amiable-dev/llm-council",
"blogConfig": {
"enabled": true,
"directory": "blog/",
"includeDrafts": false,
"tagPrefix": "llm-council"
}
}

Projects without blogConfig.enabled: true will not have their blog posts aggregated. The enabled flag follows the same pattern as adrConfig.enabled for consistency.

Output Structure

blog/
projects/
{repo-name}/
YYYY-MM-DD-post-slug.md
...

Front Matter Schema

A standardized front matter contract enables consistent aggregation while allowing graceful degradation when fields are missing or malformed.

Docusaurus Standard Fields (Preserved)

These fields are defined by Docusaurus and passed through unchanged:

FieldTypeDefaultDescription
titlestringMarkdown H1 or filenameBlog post heading
title_metastringtitleSEO title for <head> metadata
descriptionstringFirst paragraphMeta description for SEO
slugstringFile pathCustom URL path
datedatetimeFilename or fetch timePublication date (YAML format)
authorsarrayProject defaultAuthor references or inline objects
tagsarray[]Post categorization tags
imagestringnoneCover image for social cards
draftbooleanfalseExclude from production
unlistedbooleanfalseHide from listings, allow direct access
hide_table_of_contentsbooleanfalseSuppress right-side TOC
toc_min_heading_levelnumber2Minimum heading level in TOC
toc_max_heading_levelnumber3Maximum heading level in TOC
keywordsarray[]SEO keywords meta tag
last_updateobjectnoneOverride last update metadata
pagination_prevstringautoCustom previous link (ignored in aggregated context)
pagination_nextstringautoCustom next link (ignored in aggregated context)

Content Markers: The <!--truncate--> marker is critical for blog excerpts. This marker MUST be preserved during aggregation to ensure homepage listings display excerpts rather than full posts.

Extended Fields for Aggregation

These custom fields enhance the aggregation experience:

FieldTypeDefaultDescription
sidebar_labelstringtitleShorter title for sidebar navigation
sidebar_titlestringtitleAlias for sidebar_label (source convenience, takes precedence)
project_namestringInferred from repoOverride displayed project name
categorystringnoneContent type: tutorial, announcement, deep-dive, case-study, release
seriesobjectnoneMulti-part series metadata (see below)
featuredbooleanfalseHighlight in aggregated listings
reading_time_overridenumberCalculatedManual reading time in minutes

Series Metadata Schema

For multi-part content:

series:
name: "Building Arbiter Bot" # Series title
part: 2 # This post's position
total: 5 # Total parts (optional)
slug: "arbiter-bot-series" # URL-safe identifier (optional)

Note: Series are scoped per-repository. Cross-project series (posts spanning multiple repos) are not currently supported; each repo's series operates independently.

Aggregation Output Fields (Injected)

These fields are added by the aggregation script:

FieldTypeDescription
formatstringAlways md to disable MDX parsing
source_repostringOriginal owner/repo
source_urlstringGitHub blob URL to source file
source_commitstringCommit SHA at fetch time
fetched_atdatetimeISO timestamp of aggregation

Front Matter Transformation Rules

  1. Sidebar Label Generation (precedence order):

    sidebar_label = "{project_name}: {sidebar_title || sidebar_label || title}"

    Where sidebar_title takes precedence over sidebar_label if both are present.

  2. Tag Merging (with deduplication):

    • Preserve all source tags
    • Add repo name as tag (normalized to lowercase, hyphens)
    • Add skill tags from projects.json
    • Deduplicate after normalization (e.g., "TypeScript" and "typescript" become single "typescript")
  3. Author Injection (when source has no authors):

    authors:
    - name: "{project.title}"
    url: "https://github.com/{repo}"
  4. Date Extraction Priority:

    1. Front matter date field
    2. Filename pattern YYYY-MM-DD-*
    3. Current date (with warning logged)

Graceful Degradation

The aggregation script MUST succeed even with malformed or incomplete front matter:

ScenarioBehavior
Missing titleUse filename (without extension/date prefix)
Missing dateExtract from filename, else use fetch date with warning
Invalid date formatParse attempt with Date.parse(), fallback to fetch date
Missing authorsInject project-based default author
Invalid series objectIgnore series metadata, log warning
Unknown fieldsPass through unchanged
YAML parse errorSkip file, log error, continue aggregation
Empty front matterUse all defaults, proceed with aggregation
BOM (Byte Order Mark)Strip BOM before parsing (content.replace(/^\uFEFF/, ''))
Binary/non-text .md fileDetect via content inspection, skip with warning
Circular series referencesSkip series metadata, log warning

Example: Source Post

---
title: "Implementing the Actor Model"
sidebar_title: "Actor Model"
description: "How we built concurrent execution with message passing"
date: 2026-01-21
authors: [antigravity]
tags: [rust, concurrency, architecture]
category: deep-dive
series:
name: "Arbiter Bot Architecture"
part: 3
total: 5
featured: true
---

Example: Aggregated Output

---
format: md
slug: /blog/arbiter-bot/2026-01-21-implementing-the-actor-model
source_repo: amiable-dev/arbiter-bot
source_url: https://github.com/amiable-dev/arbiter-bot/blob/main/docs/blog/2026-01-21-implementing-the-actor-model.md
source_commit: abc123def456
fetched_at: '2026-01-23T12:00:00Z'
sidebar_label: 'Arbiter Bot: Actor Model'
title: "Implementing the Actor Model"
description: "How we built concurrent execution with message passing"
date: '2026-01-21'
authors: [antigravity]
tags:
- arbiter-bot
- rust
- concurrency
- architecture
- actor-model
- saga-pattern
- grpc
- telegram
category: deep-dive
series:
name: "Arbiter Bot Architecture"
part: 3
total: 5
featured: true
---

Project CLAUDE.md Guidance

Projects should include front matter guidance in their CLAUDE.md:

## Blog Post Front Matter

Blog posts support these front matter fields for aggregation to amiable.dev:

### Required
- `title`: Post title (displayed in listings and page header)
- `date`: Publication date (YYYY-MM-DD format)

### Recommended
- `description`: 1-2 sentence summary for SEO and previews
- `sidebar_title`: Shorter title for navigation (defaults to `title`)
- `authors`: Author key(s) from authors.yml or inline objects
- `tags`: Categorization tags (lowercase, hyphenated)

### Optional
- `category`: Content type (tutorial|announcement|deep-dive|case-study|release)
- `series`: For multi-part posts: `{name, part, total}`
- `featured`: Set `true` to highlight in aggregated listings
- `image`: Path to cover image for social sharing
- `draft`: Set `true` to exclude from production

### Example

\`\`\`yaml
---
title: "Building the Authentication System"
sidebar_title: "Auth System"
description: "Implementing OAuth2 with PKCE for secure API access"
date: 2026-01-15
authors: [your-author-key]
tags: [security, oauth, api]
category: deep-dive
---

Opening paragraph with hook.

\<!--truncate-->

Rest of the content...
\`\`\`

### Content Guidelines

- Place \`\<!--truncate-->\` marker after the first paragraph for homepage excerpts
- Use relative paths for images (\`./images/diagram.png\`) - they'll be converted to GitHub raw URLs
- Tags should be lowercase and hyphenated (e.g., \`clean-architecture\`, not \`Clean Architecture\`)
- Stick to standard Markdown - custom MDX components won't be available after aggregation
- Avoid \`import\` statements - they'll cause build failures in the aggregated context

### Authors

- Reference existing author keys from authors.yml: \`authors: [antigravity]\`
- Or use inline objects: \`authors: [{name: "Guest Author", url: "https://..."}]\`

### Common Pitfalls

- **YAML quoting**: Values with colons need quoting: \`title: "Fix: Authentication Bug"\`
- **Date format**: Use ISO format \`YYYY-MM-DD\`, not \`January 1st, 2026\`
- **Image paths**: Don't use absolute paths starting with \`/\` - use relative paths
- **MDX syntax**: Avoid JSX-like syntax (\`<Component />\`) as it won't render correctly

Tag Handling:

  • Preserve original tags from source post
  • Add project-prefixed tag: {repo-name}
  • Add skill tags from project configuration

Same patterns as ADR-003:

  • Relative images → absolute GitHub raw URLs (raw.githubusercontent.com)
  • Internal blog links within same directory → local paths (same-directory heuristic)
  • Other internal links → GitHub blob URLs
  • External links → preserved as-is
  • Suspicious protocols (javascript:, data:, vbscript:, file:) → stripped with warning

Internal Link Strategy: Use the same-directory heuristic for simplicity. Links to files in the same blog directory are assumed to be aggregated together and rewritten to local paths. Links to files outside the blog directory are converted to GitHub blob URLs. This avoids the complexity of two-pass aggregation while handling the common case correctly.

Cache Strategy

Create parallel .cache/blog/manifest.json with script versioning:

{
"schemaVersion": 1,
"scriptVersion": "1.0.0",
"repos": {
"amiable-dev/llm-council": {
"sha": "abc123",
"lastFetch": "2026-01-03T12:00:00Z",
"posts": ["2026-01-01-intro.md", "2026-01-15-update.md"]
}
}
}

Cache Invalidation:

  • SHA change in repo → re-fetch all posts
  • scriptVersion change → bust entire cache (handles logic changes)
  • Manual: --no-cache flag to force re-fetch

scriptVersion Bump Triggers (increment when any of these change):

  • Front matter transformation logic
  • Date extraction algorithm
  • Link rewriting rules
  • Tag merging/normalization logic
  • Output file path generation

Prebuild Integration

{
"prebuild": "node scripts/fetch-adrs.js && node scripts/fetch-blog-posts.js && node scripts/fetch-projects.js"
}

Security Model

Repo Allowlist: Only aggregate from repos explicitly listed in projects.json with blogConfig.enabled: true.

Private Repo Support: For feature parity with ADR-003's ADR aggregation:

  • Detect private repos via GitHub API response (repoData.private)
  • Use GitHub Contents API as fallback when Git Trees API is unavailable
  • Requires GITHUB_TOKEN with appropriate repo access permissions

Content Validation:

  • Sanitize front matter with js-yaml safeLoad
  • Strip suspicious link protocols (javascript:, data:, vbscript:)
  • Validate paths to prevent directory traversal (../)
  • No server-side code execution from fetched content

Rate Limiting:

  • Use GITHUB_TOKEN for authenticated requests (5000/hr vs 60/hr)
  • Use Git Trees API (single call per repo) for discovery
  • Raw content via raw.githubusercontent.com (no rate limit)
  • Implement retry with exponential backoff

Performance Expectations

MetricExpected Value
Repos to scan5-10 (portfolio projects)
Posts per repo0-20 (most projects have few/no blog posts)
API calls per build~2-3 per repo (tree + content fetches)
Build time impactLess than 30 seconds added (with caching)
Initial fetch (cold)~2-3 minutes for all repos

Failure Handling: Non-blocking. Log warnings and continue build if individual repos fail. Never break the build due to GitHub API issues.

Consequences

Positive

  • Unified content experience across portfolio projects
  • Consistent patterns with ADR aggregation
  • Full control over content transformation
  • Efficient caching reduces API calls
  • No new external dependencies
  • Inline authors avoid global file modifications

Negative

  • Additional custom code to maintain
  • Blog posts from projects may have inconsistent formatting
  • Potential for content conflicts (duplicate slugs)
  • Temporary code duplication with fetch-adrs.js until potential ADR-005 unification
  • We own edge cases: MDX parsing, date normalization, collision handling

Neutral

  • docusaurus-plugin-remote-content remains available for future use cases
  • Separate scripts allow independent iteration on ADRs vs blog posts

Alternatives Considered

Use docusaurus-plugin-remote-content

Rejected because:

  • No auto-discovery of blog posts (requires explicit file lists)
  • No relative link rewriting (breaks images and cross-references)
  • Would require significant configuration per project
  • Less control over front matter transformation
  • Introduces external dependency for functionality we already have

Refactor Everything into Unified System

Deferred because:

  • Higher risk to stable ADR functionality
  • Premature optimization without knowing blog-specific requirements
  • Can be revisited in ADR-005 after blog aggregation is proven
  • Allows commonalities to emerge naturally before abstraction

Modify Global authors.yml

Rejected because:

  • Modifying source-controlled files during builds creates dirty git states
  • Causes merge conflicts and CI complications
  • Inline author injection in front matter is cleaner and self-contained

LLM Council Review

Initial Review (2026-01-03)

This ADR was reviewed by the LLM Council (Reasoning tier). Key feedback incorporated:

  1. Author Strategy - Changed from dynamic authors.yml modification to inline front matter injection
  2. Date Extraction - Added explicit fallback strategy with Git commit date
  3. Discovery Rules - Added precedence, MDX support, draft handling, depth specification
  4. Cache Versioning - Added scriptVersion to manifest to handle logic changes
  5. Collision Handling - Added explicit slug collision resolution strategy
  6. Shared Modules - Added recommendation to share core logic via lib/ modules
  7. Security Model - Added explicit allowlist, validation, and rate limiting details
  8. ADR-005 Triggers - Added specific conditions for evaluating unified system

Front Matter Schema Review (2026-01-23)

Reviewed by Claude Opus 4.5 (Architecture Specialist) at 92% confidence. Recommendation: Approve with Changes

Critical changes incorporated:

  • Added <!--truncate--> marker preservation documentation
  • Clarified internal blog link rewriting strategy (same-directory heuristic)
  • Removed redundant includeBlogPosts in favor of blogConfig.enabled for consistency

Important changes incorporated:

  • Added private repo support for feature parity with ADR-003
  • Added BOM stripping to graceful degradation scenarios
  • Added file: protocol to suspicious protocols list
  • Documented scriptVersion bump triggers
  • Expanded CLAUDE.md template with truncate marker, MDX guidance, and common pitfalls

Minor items tracked for follow-up:

  • Cross-project series support (documented as out of scope)
  • Filename collision detection within same project (covered by existing slug collision handling)
  • Git commit date fallback (documented as future enhancement due to API cost)
  • Manifest schema alignment between ADR-003 and ADR-004 (parallel formats acceptable)

References