ADR-004: Remote Blog Post Aggregation

Status

Accepted

Date

2026-01-03

Context

With the successful implementation of ADR-003 (Cross-Project ADR Aggregation), amiable.dev now aggregates Architecture Decision Records from all portfolio projects. This establishes a pattern for pulling content from tracked GitHub repositories at build time.

A new requirement has emerged: aggregate blog posts from tracked projects to provide a unified content experience. Users should be able to discover not just ADRs but also blog content from the projects showcased on the /projects page.

Prior Art

ADR-001: Enabled docs plugin at /docs for architecture documentation
ADR-002: Established projects.json as the configuration source for tracked repositories
ADR-003: Implemented build-time ADR aggregation with:
- GitHub Git Trees API for discovery
- Raw GitHub content fetching (no rate limits)
- Front matter injection with source attribution
- Relative link rewriting
- Cache with commit SHA invalidation
- Template/boilerplate exclusion
- Status parsing and normalization

Available Solutions

Option A: docusaurus-plugin-remote-content

The @rdilweb/docusaurus-plugin-remote-content plugin provides:

Capabilities:

Two sync modes: Constant (auto during build) or CLI (manual)
Configurable source URLs and output directories
Content transformation via modifyContent callback
Separate instances for docs vs images
Axios-based HTTP fetching

Limitations:

Requires explicit document list (no auto-discovery)
No built-in caching or SHA-based invalidation
No relative link rewriting
Would require multiple plugin instances (one per project)
Less control over front matter injection
Active but minimal maintenance (last release 2023)

When to reconsider: If requirements evolve to explicit file lists (no discovery needed) and link rewriting becomes unnecessary, the plugin could reduce maintenance burden.

Option B: Extend Existing fetch-adrs.js Pattern

Leverage the proven architecture from ADR-003:

Advantages:

Consistent codebase and patterns
Full control over content transformation
Existing cache infrastructure
Single source of project configuration (projects.json)
Unified testing approach with Vitest/MSW

Considerations:

Additional custom code to maintain
Blog posts have different structure than ADRs

Option C: Create Unified Content Aggregation System

Refactor into a generalized scripts/fetch-remote-content.js that handles both ADRs and blog posts:

Advantages:

Single, well-tested content aggregation pipeline
Shared utilities (caching, link rewriting, front matter)
Easier to add future content types (e.g., docs, tutorials)
DRY principle

Considerations:

Larger refactoring effort
Risk of breaking existing ADR functionality

Decision

Option B: Extend the existing fetch-adrs.js pattern to create a parallel scripts/fetch-blog-posts.js script.

Rationale

Proven Pattern: ADR-003 established a reliable, well-tested approach with 165+ tests
Content Discovery: Blog posts need auto-discovery (like ADRs), not explicit document lists
Link Rewriting: Critical for cross-references and images, not supported by the plugin
Caching: SHA-based invalidation prevents redundant fetches
Front Matter: Full control over attribution, tags, and metadata injection
Independence: Keeps blog aggregation separate, reducing risk to ADR functionality
Plugin Overhead: Adding the plugin introduces a dependency for functionality we already have

Code Organization

To prevent logic drift between fetch-adrs.js and fetch-blog-posts.js, share core modules:

lib/
  githubFetch.js      # Fetch + retries + rate limit handling
  cacheManifest.js    # SHA-based caching utilities
  rewriteLinks.js     # Link rewriting for images/cross-refs
  frontMatter.js      # Front matter parsing and injection

Separate entrypoints allow independent iteration while shared core logic ensures consistency.

Future Consideration (ADR-005 Triggers)

Evaluate Option C (unified system) when:

A third content type is needed (e.g., docs, tutorials)
More than 50% of code is duplicated between scripts
Common bugs appear in both scripts simultaneously

Implementation

Blog Post Discovery

Directory Scanning (in order of precedence):

blog/ - Standard Docusaurus blog location
content/blog/ - Alternative content directory
posts/ - Common blog directory

First match wins; subsequent directories are not scanned if earlier ones exist.

Supported File Extensions:

.md (Markdown)
.mdx (MDX with React components)

File Patterns:

YYYY-MM-DD-*.md(x) - Date-prefixed posts
*/index.md(x) - Folder-based posts (date from front matter or Git)
*.md(x) - Non-dated posts (date from front matter or Git)

Depth: Scan nested directories recursively (e.g., blog/2024/january/post.md).

Draft Handling: Posts with draft: true in front matter are excluded from aggregation.

Date Extraction Strategy

Date is critical for Docusaurus blog ordering. Extract in this order:

Front matter date field - Highest priority
Filename pattern - Extract from YYYY-MM-DD- prefix
Fetch timestamp - Use current date with warning logged
Future enhancement - Git commit date via GitHub API (deferred due to API cost: 1 call per dateless file)

function extractDate(filename, frontMatter) {
  if (frontMatter.date) return frontMatter.date;
  const match = filename.match(/^(\d{4}-\d{2}-\d{2})/);
  if (match) return match[1];
  console.warn(`Warning: No date found for ${filename}, using fetch date`);
  return new Date().toISOString().split('T')[0]; // YYYY-MM-DD
}

Collision Handling

Slug Collisions: Two repos may have 2024-01-01-update.md.

Strategy:

Output path blog/projects/{repo-name}/ provides natural namespacing
If source post has explicit slug front matter, prefix with repo name: {repo-name}-{slug}
Log warning when collision is detected and resolved

Configuration Enhancement

Add optional blog configuration to project entries (consistent with adrConfig pattern):

{
  "repo": "amiable-dev/llm-council",
  "blogConfig": {
    "enabled": true,
    "directory": "blog/",
    "includeDrafts": false,
    "tagPrefix": "llm-council"
  }
}

Projects without blogConfig.enabled: true will not have their blog posts aggregated. The enabled flag follows the same pattern as adrConfig.enabled for consistency.

Output Structure

blog/
  projects/
    {repo-name}/
      YYYY-MM-DD-post-slug.md
      ...

Front Matter Schema

A standardized front matter contract enables consistent aggregation while allowing graceful degradation when fields are missing or malformed.

Docusaurus Standard Fields (Preserved)

These fields are defined by Docusaurus and passed through unchanged:

Field	Type	Default	Description
`title`	string	Markdown H1 or filename	Blog post heading
`title_meta`	string	`title`	SEO title for `<head>` metadata
`description`	string	First paragraph	Meta description for SEO
`slug`	string	File path	Custom URL path
`date`	datetime	Filename or fetch time	Publication date (YAML format)
`authors`	array	Project default	Author references or inline objects
`tags`	array	`[]`	Post categorization tags
`image`	string	none	Cover image for social cards
`draft`	boolean	`false`	Exclude from production
`unlisted`	boolean	`false`	Hide from listings, allow direct access
`hide_table_of_contents`	boolean	`false`	Suppress right-side TOC
`toc_min_heading_level`	number	`2`	Minimum heading level in TOC
`toc_max_heading_level`	number	`3`	Maximum heading level in TOC
`keywords`	array	`[]`	SEO keywords meta tag
`last_update`	object	none	Override last update metadata
`pagination_prev`	string	auto	Custom previous link (ignored in aggregated context)
`pagination_next`	string	auto	Custom next link (ignored in aggregated context)

Content Markers: The  marker is critical for blog excerpts. This marker MUST be preserved during aggregation to ensure homepage listings display excerpts rather than full posts.

Extended Fields for Aggregation

These custom fields enhance the aggregation experience:

Field	Type	Default	Description
`sidebar_label`	string	`title`	Shorter title for sidebar navigation
`sidebar_title`	string	`title`	Alias for `sidebar_label` (source convenience, takes precedence)
`project_name`	string	Inferred from repo	Override displayed project name
`category`	string	none	Content type: `tutorial`, `announcement`, `deep-dive`, `case-study`, `release`
`series`	object	none	Multi-part series metadata (see below)
`featured`	boolean	`false`	Highlight in aggregated listings
`reading_time_override`	number	Calculated	Manual reading time in minutes

Series Metadata Schema

For multi-part content:

series:
  name: "Building Arbiter Bot"  # Series title
  part: 2                        # This post's position
  total: 5                       # Total parts (optional)
  slug: "arbiter-bot-series"     # URL-safe identifier (optional)

Note: Series are scoped per-repository. Cross-project series (posts spanning multiple repos) are not currently supported; each repo's series operates independently.

Aggregation Output Fields (Injected)

These fields are added by the aggregation script:

Field	Type	Description
`format`	string	Always `md` to disable MDX parsing
`source_repo`	string	Original `owner/repo`
`source_url`	string	GitHub blob URL to source file
`source_commit`	string	Commit SHA at fetch time
`fetched_at`	datetime	ISO timestamp of aggregation

Front Matter Transformation Rules

Sidebar Label Generation (precedence order):
```
sidebar_label = "{project_name}: {sidebar_title || sidebar_label || title}"
```
Where sidebar_title takes precedence over sidebar_label if both are present.
Tag Merging (with deduplication):
- Preserve all source tags
- Add repo name as tag (normalized to lowercase, hyphens)
- Add skill tags from projects.json
- Deduplicate after normalization (e.g., "TypeScript" and "typescript" become single "typescript")

Author Injection (when source has no authors):

authors:
  - name: "{project.title}"
    url: "https://github.com/{repo}"

Date Extraction Priority:
1. Front matter date field
2. Filename pattern YYYY-MM-DD-*
3. Current date (with warning logged)

Graceful Degradation

The aggregation script MUST succeed even with malformed or incomplete front matter:

Scenario	Behavior
Missing `title`	Use filename (without extension/date prefix)
Missing `date`	Extract from filename, else use fetch date with warning
Invalid `date` format	Parse attempt with Date.parse(), fallback to fetch date
Missing `authors`	Inject project-based default author
Invalid `series` object	Ignore series metadata, log warning
Unknown fields	Pass through unchanged
YAML parse error	Skip file, log error, continue aggregation
Empty front matter	Use all defaults, proceed with aggregation
BOM (Byte Order Mark)	Strip BOM before parsing (`content.replace(/^\uFEFF/, '')`)
Binary/non-text `.md` file	Detect via content inspection, skip with warning
Circular series references	Skip series metadata, log warning

Example: Source Post

---
title: "Implementing the Actor Model"
sidebar_title: "Actor Model"
description: "How we built concurrent execution with message passing"
date: 2026-01-21
authors: [antigravity]
tags: [rust, concurrency, architecture]
category: deep-dive
series:
  name: "Arbiter Bot Architecture"
  part: 3
  total: 5
featured: true
---

Example: Aggregated Output

---
format: md
slug: /blog/arbiter-bot/2026-01-21-implementing-the-actor-model
source_repo: amiable-dev/arbiter-bot
source_url: https://github.com/amiable-dev/arbiter-bot/blob/main/docs/blog/2026-01-21-implementing-the-actor-model.md
source_commit: abc123def456
fetched_at: '2026-01-23T12:00:00Z'
sidebar_label: 'Arbiter Bot: Actor Model'
title: "Implementing the Actor Model"
description: "How we built concurrent execution with message passing"
date: '2026-01-21'
authors: [antigravity]
tags:
  - arbiter-bot
  - rust
  - concurrency
  - architecture
  - actor-model
  - saga-pattern
  - grpc
  - telegram
category: deep-dive
series:
  name: "Arbiter Bot Architecture"
  part: 3
  total: 5
featured: true
---

Project CLAUDE.md Guidance

Projects should include front matter guidance in their CLAUDE.md:

## Blog Post Front Matter

Blog posts support these front matter fields for aggregation to amiable.dev:

### Required
- `title`: Post title (displayed in listings and page header)
- `date`: Publication date (YYYY-MM-DD format)

### Recommended
- `description`: 1-2 sentence summary for SEO and previews
- `sidebar_title`: Shorter title for navigation (defaults to `title`)
- `authors`: Author key(s) from authors.yml or inline objects
- `tags`: Categorization tags (lowercase, hyphenated)

### Optional
- `category`: Content type (tutorial|announcement|deep-dive|case-study|release)
- `series`: For multi-part posts: `{name, part, total}`
- `featured`: Set `true` to highlight in aggregated listings
- `image`: Path to cover image for social sharing
- `draft`: Set `true` to exclude from production

### Example

\`\`\`yaml
---
title: "Building the Authentication System"
sidebar_title: "Auth System"
description: "Implementing OAuth2 with PKCE for secure API access"
date: 2026-01-15
authors: [your-author-key]
tags: [security, oauth, api]
category: deep-dive
---

Opening paragraph with hook.

\<!--truncate-->

Rest of the content...
\`\`\`

### Content Guidelines

- Place \`\<!--truncate-->\` marker after the first paragraph for homepage excerpts
- Use relative paths for images (\`./images/diagram.png\`) - they'll be converted to GitHub raw URLs
- Tags should be lowercase and hyphenated (e.g., \`clean-architecture\`, not \`Clean Architecture\`)
- Stick to standard Markdown - custom MDX components won't be available after aggregation
- Avoid \`import\` statements - they'll cause build failures in the aggregated context

### Authors

- Reference existing author keys from authors.yml: \`authors: [antigravity]\`
- Or use inline objects: \`authors: [{name: "Guest Author", url: "https://..."}]\`

### Common Pitfalls

- **YAML quoting**: Values with colons need quoting: \`title: "Fix: Authentication Bug"\`
- **Date format**: Use ISO format \`YYYY-MM-DD\`, not \`January 1st, 2026\`
- **Image paths**: Don't use absolute paths starting with \`/\` - use relative paths
- **MDX syntax**: Avoid JSX-like syntax (\`<Component />\`) as it won't render correctly

Tag Handling:

Preserve original tags from source post
Add project-prefixed tag: {repo-name}
Add skill tags from project configuration

Link Rewriting

Same patterns as ADR-003:

Relative images → absolute GitHub raw URLs (raw.githubusercontent.com)
Internal blog links within same directory → local paths (same-directory heuristic)
Other internal links → GitHub blob URLs
External links → preserved as-is
Suspicious protocols (javascript:, data:, vbscript:, file:) → stripped with warning

Internal Link Strategy: Use the same-directory heuristic for simplicity. Links to files in the same blog directory are assumed to be aggregated together and rewritten to local paths. Links to files outside the blog directory are converted to GitHub blob URLs. This avoids the complexity of two-pass aggregation while handling the common case correctly.

Cache Strategy

Create parallel .cache/blog/manifest.json with script versioning:

{
  "schemaVersion": 1,
  "scriptVersion": "1.0.0",
  "repos": {
    "amiable-dev/llm-council": {
      "sha": "abc123",
      "lastFetch": "2026-01-03T12:00:00Z",
      "posts": ["2026-01-01-intro.md", "2026-01-15-update.md"]
    }
  }
}

Cache Invalidation:

SHA change in repo → re-fetch all posts
scriptVersion change → bust entire cache (handles logic changes)
Manual: --no-cache flag to force re-fetch

scriptVersion Bump Triggers (increment when any of these change):

Front matter transformation logic
Date extraction algorithm
Link rewriting rules
Tag merging/normalization logic
Output file path generation

Prebuild Integration

{
  "prebuild": "node scripts/fetch-adrs.js && node scripts/fetch-blog-posts.js && node scripts/fetch-projects.js"
}

Security Model

Repo Allowlist: Only aggregate from repos explicitly listed in projects.json with blogConfig.enabled: true.

Private Repo Support: For feature parity with ADR-003's ADR aggregation:

Detect private repos via GitHub API response (repoData.private)
Use GitHub Contents API as fallback when Git Trees API is unavailable
Requires GITHUB_TOKEN with appropriate repo access permissions

Content Validation:

Sanitize front matter with js-yaml safeLoad
Strip suspicious link protocols (javascript:, data:, vbscript:)
Validate paths to prevent directory traversal (../)
No server-side code execution from fetched content

Rate Limiting:

Use GITHUB_TOKEN for authenticated requests (5000/hr vs 60/hr)
Use Git Trees API (single call per repo) for discovery
Raw content via raw.githubusercontent.com (no rate limit)
Implement retry with exponential backoff

Performance Expectations

Metric	Expected Value
Repos to scan	5-10 (portfolio projects)
Posts per repo	0-20 (most projects have few/no blog posts)
API calls per build	~2-3 per repo (tree + content fetches)
Build time impact	Less than 30 seconds added (with caching)
Initial fetch (cold)	~2-3 minutes for all repos

Failure Handling: Non-blocking. Log warnings and continue build if individual repos fail. Never break the build due to GitHub API issues.

Consequences

Positive

Unified content experience across portfolio projects
Consistent patterns with ADR aggregation
Full control over content transformation
Efficient caching reduces API calls
No new external dependencies
Inline authors avoid global file modifications

Negative

Additional custom code to maintain
Blog posts from projects may have inconsistent formatting
Potential for content conflicts (duplicate slugs)
Temporary code duplication with fetch-adrs.js until potential ADR-005 unification
We own edge cases: MDX parsing, date normalization, collision handling

Neutral

docusaurus-plugin-remote-content remains available for future use cases
Separate scripts allow independent iteration on ADRs vs blog posts

Alternatives Considered

Use docusaurus-plugin-remote-content

Rejected because:

No auto-discovery of blog posts (requires explicit file lists)
No relative link rewriting (breaks images and cross-references)
Would require significant configuration per project
Less control over front matter transformation
Introduces external dependency for functionality we already have

Refactor Everything into Unified System

Deferred because:

Higher risk to stable ADR functionality
Premature optimization without knowing blog-specific requirements
Can be revisited in ADR-005 after blog aggregation is proven
Allows commonalities to emerge naturally before abstraction

Modify Global authors.yml

Rejected because:

Modifying source-controlled files during builds creates dirty git states
Causes merge conflicts and CI complications
Inline author injection in front matter is cleaner and self-contained

LLM Council Review

Initial Review (2026-01-03)

This ADR was reviewed by the LLM Council (Reasoning tier). Key feedback incorporated:

Author Strategy - Changed from dynamic authors.yml modification to inline front matter injection
Date Extraction - Added explicit fallback strategy with Git commit date
Discovery Rules - Added precedence, MDX support, draft handling, depth specification
Cache Versioning - Added scriptVersion to manifest to handle logic changes
Collision Handling - Added explicit slug collision resolution strategy
Shared Modules - Added recommendation to share core logic via lib/ modules
Security Model - Added explicit allowlist, validation, and rate limiting details
ADR-005 Triggers - Added specific conditions for evaluating unified system

Front Matter Schema Review (2026-01-23)

Reviewed by Claude Opus 4.5 (Architecture Specialist) at 92% confidence. Recommendation: Approve with Changes

Critical changes incorporated:

Added  marker preservation documentation
Clarified internal blog link rewriting strategy (same-directory heuristic)
Removed redundant includeBlogPosts in favor of blogConfig.enabled for consistency

Important changes incorporated:

Added private repo support for feature parity with ADR-003
Added BOM stripping to graceful degradation scenarios
Added file: protocol to suspicious protocols list
Documented scriptVersion bump triggers
Expanded CLAUDE.md template with truncate marker, MDX guidance, and common pitfalls

Minor items tracked for follow-up:

Cross-project series support (documented as out of scope)
Filename collision detection within same project (covered by existing slug collision handling)
Git commit date fallback (documented as future enhancement due to API cost)
Manifest schema alignment between ADR-003 and ADR-004 (parallel formats acceptable)

Status​

Date​

Context​

Prior Art​

Available Solutions​

Option A: docusaurus-plugin-remote-content​

Option B: Extend Existing fetch-adrs.js Pattern​

Option C: Create Unified Content Aggregation System​

Decision​

Rationale​

Code Organization​

Future Consideration (ADR-005 Triggers)​

Implementation​

Blog Post Discovery​

Date Extraction Strategy​

Collision Handling​

Configuration Enhancement​

Output Structure​

Front Matter Schema​

Docusaurus Standard Fields (Preserved)​

Extended Fields for Aggregation​

Series Metadata Schema​

Aggregation Output Fields (Injected)​

Front Matter Transformation Rules​

Graceful Degradation​

Example: Source Post​

Example: Aggregated Output​

Project CLAUDE.md Guidance​

Link Rewriting​

Cache Strategy​

Prebuild Integration​

Security Model​

Performance Expectations​

Consequences​

Positive​

Negative​

Neutral​

Alternatives Considered​

Use docusaurus-plugin-remote-content​

Refactor Everything into Unified System​

Modify Global authors.yml​

LLM Council Review​

Initial Review (2026-01-03)​

Front Matter Schema Review (2026-01-23)​

References​