Skip to main content

CI/CD for a Docs Site: ADR-004

· 4 min read
Amiable Dev
Project Contributors

How we built a deployment pipeline that stays fresh without manual intervention.

The Problem

We needed a CI/CD pipeline that could:

  1. Deploy on merge to main
  2. Aggregate docs from upstream repos daily
  3. Allow manual rebuilds with cache bypass
  4. Run security scanning without slowing deploys

Why GitHub Pages?

We considered three options:

PlatformCostPR PreviewsHTTPSVendor Count
GitHub PagesFreeNoAuto (*.github.io)1
Netlify/VercelFree tierYesAuto2
Railway~$5/moYesAuto2

Cost wasn't the deciding factor—all have generous free tiers. What mattered:

  1. Vendor consolidation - secrets, permissions, and logs in one place
  2. No external OAuth - fewer security surface areas
  3. Workflow simplicity - deploy-pages action just works

The trade-off: No PR preview deployments. We accepted this because our site is documentation—reviewing markdown diffs is sufficient. For a React app with visual changes, we'd choose differently.

Note: Custom domains need DNS configuration and propagation time. The *.github.io subdomain gets HTTPS immediately.

The Pipeline

Key insight: security.yml runs in parallel with deploy.yml. A linting failure doesn't block deployment—but it does show up as a failed check on the commit.

Three triggers, one pipeline:

on:
push:
branches: [main]
schedule:
- cron: '0 6 * * *' # Daily at 6 AM UTC
workflow_dispatch:
inputs:
force_refresh:
type: boolean
default: false

Caching Strategy

Template aggregation fetches docs from GitHub repos. Without caching, every build would re-fetch everything.

Our approach:

  1. Cache key includes hashFiles('templates.yaml') - config changes invalidate
  2. Restore keys allow partial cache hits
  3. Manifest tracking in aggregation script compares commit SHAs
- name: Restore template cache
if: ${{ github.event.inputs.force_refresh != 'true' }}
uses: actions/cache@v5
with:
path: .cache/templates
key: templates-${{ hashFiles('templates.yaml') }}-${{ github.run_id }}
restore-keys: |
templates-${{ hashFiles('templates.yaml') }}-
templates-

The force refresh option clears the cache entirely:

- name: Clear cache (if force refresh)
if: ${{ github.event.inputs.force_refresh == 'true' }}
run: rm -rf .cache/templates

Security Scanning

Separate workflow, parallel execution:

# security.yml
jobs:
gitleaks:
# Secret scanning on every push

dependency-review:
# License and vulnerability check on PRs

yaml-lint:
# Configuration validation

This keeps security checks from blocking deploys while still catching issues.

The yamllint War Story

Our first security run failed spectacularly:

##[error]mkdocs.yml:88:5 [indentation] wrong indentation: expected 6 but found 4
##[error]templates.yaml:45:121 [line-length] line too long (156 > 120 characters)
##[warning].github/workflows/deploy.yml:3:1 [truthy] truthy value should be one of [false, true]

The investigation revealed three conflicts:

  1. on: is not a boolean - GitHub Actions uses on: as a keyword, but yamllint sees it as a truthy value
  2. MkDocs doesn't require --- - yamllint's document-start rule expects it
  3. Description fields are long - template descriptions exceed 120 characters

The fix: .yamllint.yml configuration that respects ecosystem conventions:

rules:
# GitHub Actions uses `on:` as a keyword
truthy:
allowed-values: ['true', 'false', 'on']

# MkDocs files don't need document start
document-start: disable

# Allow longer lines for descriptions
line-length:
max: 200

Lesson: Linting tools need per-ecosystem configuration. Default rules assume vanilla YAML.

Build Times

ScenarioTime
Cold build (no cache)~45s
Warm build (cached)~20s
Force refresh~45s

Most deploys hit the cache. Daily scheduled builds may be slower if upstream repos changed.

What We Learned

  1. Separate security from deploy - don't let linting failures block urgent content fixes
  2. Cache aggressively, invalidate precisely - manifest-based tracking beats time-based expiry
  3. Make force refresh easy - when caching goes wrong, you need an escape hatch

What's Next

  • ADR-005: DevSecOps implementation (the security.yml details)

Links:

DevSecOps for a Docs Site (ADR-005)

· 4 min read
Amiable Dev
Project Contributors

We added security scanning to a documentation site. Most DevSecOps guides assume you have application code. We don't.

The Problem

Documentation repositories have different security concerns than application code:

  • No server-side runtime - no SQL injection or RCE vectors (though DOM-based XSS remains possible)
  • No application secrets - but build-time secrets (GitHub tokens, API keys) can still leak
  • Community contributions - forks need to pass CI without repository secrets

Most DevSecOps tooling is overkill here. SAST (static code analysis) and DAST (runtime probing) assume you have application code. Container scanning assumes you have containers. We needed a minimal, fork-friendly approach.

The 3-Layer Pipeline

Layer 1 catches issues before they're committed. Layer 2 validates PRs from forks (no secrets required). Layer 3 runs post-merge for ongoing protection.

Fork-Friendly Design

This was the key constraint. GitHub intentionally isolates repository secrets from fork PRs to prevent malicious PRs from exfiltrating credentials.

The failure mode we avoided: If your security workflow requires SONAR_TOKEN or similar, every community contribution triggers a CI failure. Contributors wait for maintainers to manually approve, friction accumulates, contributions slow down.

Our security workflow uses only:

env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

GITHUB_TOKEN is automatically provided to all workflows, including forks. No API keys, no OAuth tokens, no external services.

What this enables:

  • Contributors don't need to configure anything
  • All security checks pass on fork PRs
  • No "skip CI" friction for external contributions
  • Avoids the pull_request_target security footgun

The Gitleaks Gotcha

Our first implementation had a dangerous allowlist:

.gitleaks.toml (DANGEROUS)
# DON'T DO THIS - excludes all markdown from scanning
[allowlist]
paths = [
'''\.md$''',
]

This excludes all markdown files from secret scanning. For a documentation repository, that's most of the codebase.

Why this matters: Documentation often contains tutorial code blocks. Engineers copy-paste examples and accidentally include real API keys. Markdown files are where secrets leak in docs repos.

The fix: allowlist specific patterns, not entire file types:

.gitleaks.toml (SAFE)
# DO THIS - only ignore explicit example patterns
[[rules]]
id = "example-api-key"
regex = '''sk-example-[a-zA-Z0-9]+'''
allowlist = { regexes = ['''sk-example-'''] }

[[rules]]
id = "placeholder-key"
regex = '''YOUR_API_KEY|your-api-key'''
allowlist = { regexes = ['''YOUR_API_KEY|your-api-key'''] }

Real secrets in markdown files will still be caught. Only explicit example patterns (sk-example-*, YOUR_API_KEY) are ignored.

Tools We Didn't Use

ToolWhy Excluded
CodeQLNo codebase to analyze
SnykDependabot sufficient at this scale
TrivyNo containers
SonarCloudOverkill for docs
SemgrepNo application code

The right amount of security tooling is the minimum that covers your actual risks.

War Story: The YAML 1.1 Truthy (aka "The Norway Problem")

Our security workflow failed immediately:

[truthy] truthy value should be one of [false, true]
3:1 error on:

GitHub Actions uses on: as a keyword. But YAML 1.1 treats on, off, yes, and no as booleans. This is sometimes called "The Norway Problem" because country code NO gets parsed as false.

Fix in .yamllint.yml:

.yamllint.yml
rules:
truthy:
allowed-values: ['true', 'false', 'on']
check-keys: false

The Minimal Stack

Total configuration: 3 files, ~50 lines of YAML.

Full ADR

See ADR-005: DevSecOps Implementation for the complete Architecture Decision Record.

Build-Time Documentation Aggregation (ADR-006)

· 5 min read
Amiable Dev
Project Contributors

We built a system to fetch documentation from multiple GitHub repositories at build time. The trick: SHA-based caching that makes incremental builds near-instant.

The Problem

We have templates across multiple repositories:

  • litellm-langfuse-railway (starter + production configs)
  • llm-council (multi-model consensus system)

Each has its own documentation. Users shouldn't have to visit three different repos to understand their options.

Goal: Unified documentation portal with content from all template repos.

Why Build-Time Aggregation?

We considered three approaches:

ApproachProsCons
Manual copySimpleStale immediately
Git submodulesReal-timeComplex, version conflicts
Build-time fetchFresh daily, cacheableRequires API access

Build-time aggregation wins: content is fresh (daily rebuilds), caching makes it fast, and errors don't break the site.

The Caching Strategy

The naive approach: fetch everything on every build. With 3 templates and multiple docs each, that's slow and hits API rate limits.

Our approach: SHA-based cache invalidation with a lightweight API check.

How It Works

The key insight: We don't download content to check if it changed. One lightweight API call (GET /repos/{owner}/{repo}/commits/HEAD) returns the current SHA. Compare against the manifest. Done.

async def get_commit_sha(self, owner: str, repo: str) -> str | None:
"""Get the SHA of the default branch HEAD (1 API call, no content)."""
url = f"{GITHUB_API_BASE}/repos/{owner}/{repo}/commits/HEAD"
async with self._session.get(url) as resp:
if resp.status == 200:
data = await resp.json()
return data["sha"] # Just the SHA, not the content

Cache Granularity: Repo-Level

We cache at the repo level, not file level. One new commit invalidates all docs from that repo. This is simpler than tracking individual file changes, and repos don't change that often.

The manifest tracks:

{
"litellm-langfuse-starter": {
"commit_sha": "5a45454c15e0e5e17ff20a3f0d6df421c1f037db",
"fetched_at": "2026-01-03T18:43:43Z",
"files": ["overview.md", "setup.md"]
}
}

Result: If the repo hasn't changed, skip the fetch entirely.

2026-01-03 18:44:00 [INFO]   Using cached content (SHA: 5a45454)

Content Transformation

Raw content from upstream repos has relative links that break when moved. The ContentTransformer class handles this:

def _rewrite_links(self, content: str) -> str:
"""Rewrite relative markdown links to GitHub blob URLs."""
# [Setup Guide](https://github.com/amiable-dev/amiable-templates/blob/0cd5d5fb7cba1d8bac3763502597ecd102d9f667/docs/blog/posts/setup.md)
# → [Setup Guide](https://github.com/owner/repo/blob/sha/path/setup.md)

Image Rewriting

def _rewrite_images(self, content: str) -> str:
"""Rewrite relative image paths to raw.githubusercontent.com URLs."""
# ![diagram](https://raw.githubusercontent.com/amiable-dev/amiable-templates/0cd5d5fb7cba1d8bac3763502597ecd102d9f667/docs/blog/posts/../assets/arch.png)
# → ![diagram](https://raw.githubusercontent.com/owner/repo/sha/assets/arch.png)

Source Attribution

Every aggregated doc gets an info box:

!!! info "Source Repository"
This documentation is from [amiable-dev/litellm-langfuse-railway](...).
Last synced: 2026-01-03 | Commit: `5a45454`

Users always know where the content came from.

Error Handling Philosophy

Never fail the build due to upstream issues. But be loud about failures.

Hard vs. Soft Errors

Error TypeBehavior
Config errors (invalid YAML)Fail fast
Network errorsUse cached content, log warning
Repo not foundSkip, log warning
File not foundSkip file, continue
Rate limitUse cached content
results = await asyncio.gather(
*[aggregate_template(t, fetcher, cache, output_dir) for t in templates],
return_exceptions=True, # Collect errors, don't fail
)

for result in results:
if isinstance(result, Exception):
logger.error(f"Aggregation error: {result}") # Be loud

Stale Content Risk

The danger: a repo fails to update for weeks, and users see stale docs thinking they're current.

Mitigation: The source attribution box includes sync date and commit SHA. Users can verify freshness:

!!! info "Source Repository"
Last synced: 2026-01-03 | Commit: `5a45454`

If the sync date is old, something's wrong. CI logs show fetch failures for investigation.

GitHub API Considerations

Rate Limit Math

Auth MethodLimitOur Usage
Unauthenticated60/hourNot viable
GITHUB_TOKEN5,000/hourWhat we use
GitHub App5,000+/hourOverkill for docs

Our request pattern per build:

  • 3 repos × 1 SHA check = 3 API requests
  • Content fetched via raw.githubusercontent.com (no rate limit)
  • Cached builds: 0 content fetches

Even with 50 repos, we'd use 50 requests per build. The 5,000/hour limit is plenty.

Fetch Optimization

# SHA check: Uses API (rate limited, but just 1 request per repo)
url = f"{GITHUB_API_BASE}/repos/{owner}/{repo}/commits/HEAD"

# Content fetch: Uses raw.githubusercontent.com (no rate limit!)
url = f"{GITHUB_RAW_BASE}/{owner}/{repo}/{sha}/{path}"

This split is intentional: the API for metadata, raw URLs for content.

CI Integration

.github/workflows/deploy.yml
- name: Restore template cache
uses: actions/cache@v5
with:
path: .cache/templates
key: templates-${{ hashFiles('templates.yaml') }}-${{ github.run_id }}
restore-keys: |
templates-${{ hashFiles('templates.yaml') }}-
templates-

- name: Aggregate template documentation
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python scripts/aggregate_templates.py

The cache key strategy:

  1. Exact match: same config, same run → use cache
  2. Partial match: same config, different run → restore, then update
  3. No match: fresh fetch

The Tradeoff We Accepted

Delayed updates: Changes to upstream repos aren't instant. They appear on the next daily build (or manual dispatch).

For documentation, this is acceptable. If you need real-time sync, consider webhooks or git submodules—but accept the complexity.

Full ADR

See ADR-006: Cross-Project Documentation Aggregation for the complete decision record.

Building a GitHub Projects Showcase with TDD

· 5 min read
Chris
Amiable Dev

Building a portfolio page that showcases GitHub projects sounds straightforward until you consider the edge cases: What happens when the GitHub API is down? What if rate limits are exceeded? How do you test React components that depend on external data?

This post walks through how test-driven development (TDD) helped us build a robust /projects page with 64 automated tests, ensuring 100% deployment reliability and preventing CI failures when upstream APIs go down.