ADR-005: Algolia Search and Ask AI Optimization

Status

Accepted (Council Reviewed 2026-01-25)

Context

The amiable.dev site aggregates content from 14 different projects across three content types:

Content Type	URL Pattern	Source
Projects Page	`/projects`	`projects.json` + GitHub API
Root ADRs	`/docs/adrs/ADR-NNN-*`	`docs/adrs/`
Project ADRs	`/docs/adrs/projects/{repo-name}/ADR-NNN-*`	Aggregated from project repos
Root Blog	`/blog/{slug}`	`blog/`
Project Blog	`/blog/projects/{repo-name}/{slug}`	Aggregated from project repos

With Algolia DocSearch and Ask AI enabled (v1.9.3), search results and AI answers currently mix content from all projects without distinction. Users searching for "authentication" might get results from LLM Council, Arbiter Bot, and SmartBadge ADRs without clear project context.

Problems

Search result ambiguity: Results don't clearly indicate which project they belong to
Ask AI context confusion: AI answers may synthesize information across unrelated projects
No project-scoped search: Users can't limit search to a specific project
Content type mixing: ADRs, blog posts, and project pages are treated equally

Current Project Structure

14 projects in projects.json:

arbiter-bot, llm-council, luminescent-cluster, habit-hub, stentorosaur
smart-badge, smart-badge-docs, ops, numerai-bot, amiable-templates
conductor, midimon, midimon-plugin-registry, amiable-docusaurus

Content counts: 111 ADRs + 42 blog posts aggregated from projects.

Decision

Implement single index with facets architecture using DocSearch meta tags for project and content type attribution.

Use one Algolia index with faceted attributes:

project: Project slug (e.g., arbiter-bot, llm-council, main)
content_type: Content type (docs, blog, projects)

This approach is recommended by Algolia for multi-faceted sites and enables:

Cross-project search (default)
Project-scoped search via facetFilters
Content type filtering
Ask AI project boundaries

Implementation

Phase 1: Shared Configuration

Create a centralized project configuration to avoid drift:

src/lib/projects.ts:

// Derive project slugs from projects.json to prevent drift
import projects from '@site/projects.json';

// Extract repo name (e.g., 'amiable-dev/llm-council' → 'llm-council')
export const KNOWN_PROJECTS = new Set(
  projects.map((p: { repo: string }) => p.repo.split('/')[1])
);

export function isKnownProject(slug: string): boolean {
  return KNOWN_PROJECTS.has(slug);
}

export function inferProjectFromPermalink(permalink: string): string {
  // Match /docs/adrs/projects/{project-name}/... or /blog/projects/{project-name}/...
  const patterns = [
    /\/docs\/adrs\/projects\/([^/]+)/,
    /\/blog\/projects\/([^/]+)/,
  ];

  for (const pattern of patterns) {
    const match = permalink.match(pattern);
    if (match && KNOWN_PROJECTS.has(match[1])) {
      return match[1];
    }
  }
  return 'main'; // Site's own content
}

Phase 2: Inject DocSearch Meta Tags via Theme Wrappers

DocSearch crawler automatically picks up docsearch:* meta tags and attaches them as record attributes.

Swizzle DocItem for docs (use --wrap for upgrade safety):

npx docusaurus swizzle @docusaurus/theme-classic DocItem/Layout -- --wrap

src/theme/DocItem/Layout/index.tsx:

import React from 'react';
import Head from '@docusaurus/Head';
import Layout from '@theme-original/DocItem/Layout';
import type LayoutType from '@theme/DocItem/Layout';
import type { WrapperProps } from '@docusaurus/types';
import { useDoc } from '@docusaurus/plugin-content-docs/client';
import { inferProjectFromPermalink } from '@site/src/lib/projects';

type Props = WrapperProps<typeof LayoutType>;

export default function DocItemLayoutWrapper(props: Props): JSX.Element {
  const { frontMatter, metadata } = useDoc();

  // Priority: front matter > permalink inference
  // Use metadata.permalink (stable during SSG) instead of useLocation
  const project = (frontMatter as { project?: string }).project
    ?? inferProjectFromPermalink(metadata.permalink);

  return (
    <>
      <Head>
        <meta name="docsearch:project" content={project} />
        <meta name="docsearch:content_type" content="docs" />
      </Head>
      <Layout {...props} />
    </>
  );
}

Swizzle BlogPostItem for blog:

npx docusaurus swizzle @docusaurus/theme-classic BlogPostItem -- --wrap

src/theme/BlogPostItem/index.tsx:

import React from 'react';
import Head from '@docusaurus/Head';
import BlogPostItem from '@theme-original/BlogPostItem';
import type BlogPostItemType from '@theme/BlogPostItem';
import type { WrapperProps } from '@docusaurus/types';
import { useBlogPost } from '@docusaurus/plugin-content-blog/client';
import { inferProjectFromPermalink, isKnownProject } from '@site/src/lib/projects';

type Props = WrapperProps<typeof BlogPostItemType>;

function inferProjectFromTags(tags: Array<{ label: string }>): string | undefined {
  for (const tag of tags) {
    if (isKnownProject(tag.label)) {
      return tag.label;
    }
  }
  return undefined;
}

export default function BlogPostItemWrapper(props: Props): JSX.Element {
  const { frontMatter, metadata } = useBlogPost();

  // Priority: front matter > tags > permalink inference
  // Use metadata.permalink (stable during SSG) instead of useLocation
  const project = (frontMatter as { project?: string }).project
    ?? inferProjectFromTags(metadata.tags || [])
    ?? inferProjectFromPermalink(metadata.permalink);

  return (
    <>
      <Head>
        <meta name="docsearch:project" content={project} />
        <meta name="docsearch:content_type" content="blog" />
      </Head>
      <BlogPostItem {...props} />
    </>
  );
}

Projects page (src/pages/projects.tsx):

Add meta tags directly (standalone pages are not covered by swizzled components):

import Head from '@docusaurus/Head';

export default function ProjectsPage(): JSX.Element {
  return (
    <Layout title="Projects">
      <Head>
        <meta name="docsearch:project" content="main" />
        <meta name="docsearch:content_type" content="projects" />
      </Head>
      {/* ... rest of component */}
    </Layout>
  );
}

Phase 3: Algolia Crawler Configuration

Key insight: DocSearch automatically extracts meta tags following the docsearch:$NAME pattern. No explicit recordProps configuration is needed for project and content_type attributes.

From the DocSearch Required Configuration documentation:

"Our crawler automatically extracts information from our DocSearch specific meta tags"

Since we inject <meta name="docsearch:project" content="..."> and <meta name="docsearch:content_type" content="..."> in Phase 2, the crawler will automatically:

Extract the content attribute value
Add project and content_type attributes to all records from that page

Crawler configuration: No changes needed to recordProps. Keep your existing configuration as-is.

Exclusion patterns (add at root level of crawler config, not inside actions):

new Crawler({
  appId: "08U6IN17TV",
  // ... other root config

  // Exclude non-content pages from indexing
  exclusionPatterns: [
    "https://amiable.dev/status/**",      // Dynamic system health data
    "https://amiable.dev/blog/tags/**",   // Auto-generated tag index pages
    "https://amiable.dev/blog/archive/**", // Archive listing page
    "https://amiable.dev/docs/tags/**",   // Docs tag pages
    "https://amiable.dev/search",         // Search page itself
  ],

  actions: [
    // ... existing actions (no changes needed)
  ],
});

Note: exclusionPatterns goes at the root level alongside startUrls, sitemaps, etc. - not inside the actions array. See exclusionPatterns documentation.

Index settings (Algolia Dashboard → Index → Configuration):

{
  "attributesForFaceting": [
    "filterOnly(project)",
    "filterOnly(content_type)"
  ],
  "distinct": true,
  "attributeForDistinct": "url"
}

Note: Use filterOnly() to prevent facet values from affecting relevance ranking.

Phase 4: Markdown Index for Ask AI

Ask AI works best with a separate markdown-optimized index containing clean, structured text without HTML navigation clutter. See DocSearch Markdown Indexing Guide.

Add a second action to your crawler for the markdown index:

// NEW: Markdown index for Ask AI (add alongside existing DocSearch action)
{
  indexName: "amiable-dev-index-markdown",
  pathsToMatch: ["https://amiable.dev/**"],
  recordExtractor: ({ $, url, helpers }) => {
    // Extract clean text, excluding navigation
    const text = helpers.markdown(
      "main > *:not(nav):not(header):not(.breadcrumb)"
    );
    if (text === "") return [];

    // Extract Docusaurus meta tags (including our custom project tag)
    const language = $('meta[name="docsearch:language"]').attr("content") || "en";
    const version = $('meta[name="docsearch:version"]').attr("content") || "current";
    const docusaurus_tag = $('meta[name="docsearch:docusaurus_tag"]').attr("content") || "";
    const project = $('meta[name="docsearch:project"]').attr("content") || "main";
    const title = $("head > title").text();
    const h1 = $("main h1").first().text();

    return helpers.splitTextIntoRecords({
      text,
      baseRecord: {
        url,
        objectID: url,
        title: title || h1,
        heading: h1,
        language,
        version,
        docusaurus_tag,
        project,
      },
      maxRecordBytes: 100000,  // Larger = fewer chunks, more context
      orderingAttributeName: "part",
    });
  },
},

Markdown index settings (Algolia Dashboard → Index → Configuration):

{
  "attributesForFaceting": ["language", "version", "docusaurus_tag", "project"],
  "searchableAttributes": ["title", "heading", "unordered(text)"],
  "ignorePlurals": true,
  "typoTolerance": false,
  "advancedSyntax": false
}

Phase 5: Docusaurus Algolia Configuration

docusaurus.config.js:

themeConfig: {
  algolia: {
    appId: '08U6IN17TV',
    apiKey: 'fc6f1c31ce08f989d8ea7df840eddb5e',
    indexName: 'amiable-dev-index',  // Main keyword search
    contextualSearch: true,
    searchPagePath: 'search',
    askAi: {
      indexName: 'amiable-dev-index-markdown',  // Markdown index for AI
      apiKey: 'fc6f1c31ce08f989d8ea7df840eddb5e',
      appId: '08U6IN17TV',
      assistantId: 'aVJcPQwofdBY',
    },
  },
}

Phase 6: Front Matter Extensions

Update aggregation scripts to inject explicit project field:

scripts/fetch-adrs.js and scripts/fetch-blog-posts.js:

// Add during front matter transformation
frontMatter.project = repoName; // e.g., 'llm-council'

This provides:

Explicit project attribution in front matter
Override capability for cross-project content
Consistency with meta tag injection

Phase 7 (Future): Dynamic Project-Scoped Search

For enhanced UX, customize the DocSearch modal to add route-aware filtering:

// Pseudocode for dynamic scoping
const currentProject = inferProjectFromPermalink(location.pathname);

algoliaSearchOptions: {
  searchParameters: {
    facetFilters: currentProject !== 'main'
      ? [`project:${currentProject}`]
      : [],
  },
}

Default behavior by route:

On /docs/adrs/projects/llm-council/... → filter to project:llm-council
On /blog/projects/arbiter-bot/... → filter to project:arbiter-bot
On root pages (/, /projects, /blog) → no filter (global search)

Phase 8 (Future): Display Project Badge in Search Results

To show which project a search result belongs to, customize the DocSearch hit component using hitComponent:

Swizzle the SearchBar or use Algolia theme config:

// In docusaurus.config.js or custom SearchBar component
algolia: {
  // ... existing config
  // Note: hitComponent customization requires swizzling SearchBar
}

Custom hit component example:

import React from 'react';

function CustomHit({ hit, children }) {
  const project = hit.project || 'main';
  const showBadge = project !== 'main';

  return (
    <a href={hit.url} className="DocSearch-Hit-Container">
      {showBadge && (
        <span className="project-badge" style={{
          fontSize: '0.7rem',
          padding: '2px 6px',
          borderRadius: '4px',
          backgroundColor: 'var(--ifm-color-primary-light)',
          color: 'white',
          marginRight: '8px',
        }}>
          {project}
        </span>
      )}
      {children}
    </a>
  );
}

When to implement: Consider implementing when:

Users report confusion about which project results come from
Search result disambiguation becomes a priority
The number of aggregated projects grows significantly

Recommendation: Start without custom hit rendering. The default DocSearch UI shows breadcrumbs (lvl0/lvl1) which already provide context. Monitor user feedback before adding complexity.

Validation Plan

Build-Time Verification

# 1. Build the site
npm run build

# 2. Verify meta tags in generated HTML
grep -r "docsearch:project" build/ | head -20

# 3. Check specific project ADR
cat build/docs/adrs/projects/llm-council/adr-001-council-summary/index.html | grep "docsearch:"

# 4. Check projects page
cat build/projects/index.html | grep "docsearch:"

Post-Crawl Verification

Algolia Dashboard → Index → Browse:
- Verify records have project and content_type attributes
- Check that child records (paragraphs/headings) inherit facets

Test faceted queries:

Query: "architecture"
Filter: project:llm-council
Expected: Only LLM Council results

Test Ask AI boundaries:
- Ask: "How does authentication work in LLM Council?"
- Verify answer doesn't reference Arbiter Bot or other projects

Unit Tests

__tests__/lib/projects.test.ts:

import { inferProjectFromPermalink, isKnownProject } from '@site/src/lib/projects';

describe('inferProjectFromPermalink', () => {
  it('extracts project from docs path', () => {
    expect(inferProjectFromPermalink('/docs/adrs/projects/llm-council/adr-001'))
      .toBe('llm-council');
  });

  it('extracts project from blog path', () => {
    expect(inferProjectFromPermalink('/blog/projects/arbiter-bot/market-discovery'))
      .toBe('arbiter-bot');
  });

  it('returns main for root docs', () => {
    expect(inferProjectFromPermalink('/docs/adrs/ADR-001-blog-homepage'))
      .toBe('main');
  });

  it('returns main for unknown project', () => {
    expect(inferProjectFromPermalink('/docs/adrs/projects/unknown-project/adr-001'))
      .toBe('main');
  });
});

Rollback Plan

If faceted search causes issues:

Revert swizzled components:

git revert <commit-hash>
# Or simply delete src/theme/DocItem and src/theme/BlogPostItem

Remove facetFilters from search config (if added)
No crawler changes needed: Facets become unused attributes; existing search continues working
Re-crawl not required: Index continues to function without facet filtering

Consequences

Positive

Clear project attribution: Every record tagged with source project
Scoped search: Users can filter to specific projects
AI context boundaries: Ask AI answers scoped to relevant project
Content type filtering: Separate docs, blog, and project page results
No index restructuring: Single index, facet-based filtering
Incremental adoption: Works with existing content
Upgrade safe: Using --wrap swizzle mode

Negative

Theme customization: Requires swizzling DocItem and BlogPostItem
Crawler dependency: Requires Algolia crawler to extract and propagate meta tags
Dynamic scoping: Full project-scoped search UX requires custom DocSearch modal (Phase 6)

Risks

Risk	Severity	Mitigation
Crawler doesn't propagate meta tags to child records	High	Test with single page first; contact Algolia support
Standalone pages missing meta tags	High	Explicitly add `<Head>` to `/projects` and any other standalone pages
Project list drift	Medium	Derive from `projects.json` instead of hardcoding
SSR/hydration mismatch	Low	Use `metadata.permalink` instead of `useLocation()`

Alternatives Considered

Multiple Indices (Option B)

Separate index per project:

amiable-dev-main
amiable-dev-llm-council
amiable-dev-arbiter-bot
etc.

Rejected because:

Complex crawler configuration (14+ separate crawls)
Harder to implement cross-project search (federated search)
Quota management across indices
Overkill for current content volume (153 aggregated pieces)

URL-Based Faceting Only

Infer project from URL patterns in crawler rules without meta tags.

Rejected because:

Fragile if URL structure changes
Meta tags are more explicit and self-documenting
No front matter override capability
Harder to debug (logic in crawler vs visible in HTML)

Tag-Only Approach

Rely entirely on existing tags field in front matter.

Rejected because:

Tags are mixed (project + topic)
No content type distinction
Less semantic than dedicated project attribute

Council Review

Reviewed: 2026-01-25 by LLM Council (high confidence tier)

Models: claude-opus-4.5, gpt-5.2, gemini-3-pro-preview, grok-4.1-fast

Verdict: Approved with modifications

Key recommendations incorporated:

Centralize KNOWN_PROJECTS derived from projects.json
Use metadata.permalink instead of useLocation() for SSR stability
Add explicit crawler configuration section
Add phased implementation plan
Add validation and rollback procedures
Handle standalone /projects page explicitly

References

Algolia DocSearch Facet Filters
DocSearch Meta Tags
Docusaurus Algolia Configuration
Algolia Index Settings
ADR-003: GitHub Projects Showcase (project structure)
ADR-004: Remote Blog Post Aggregation (blog aggregation patterns)

Status​

Context​

Problems​

Current Project Structure​

Decision​

Architecture: Single Index + Facets​

Implementation​

Phase 1: Shared Configuration​

Phase 2: Inject DocSearch Meta Tags via Theme Wrappers​

Phase 3: Algolia Crawler Configuration​

Phase 4: Markdown Index for Ask AI​

Phase 5: Docusaurus Algolia Configuration​

Phase 6: Front Matter Extensions​

Phase 7 (Future): Dynamic Project-Scoped Search​

Phase 8 (Future): Display Project Badge in Search Results​

Validation Plan​

Build-Time Verification​

Post-Crawl Verification​

Unit Tests​

Rollback Plan​

Consequences​

Positive​

Negative​

Risks​

Alternatives Considered​

Multiple Indices (Option B)​

URL-Based Faceting Only​

Tag-Only Approach​

Council Review​

References​

Status

Context

Problems

Current Project Structure

Decision

Architecture: Single Index + Facets

Implementation

Phase 1: Shared Configuration

Phase 2: Inject DocSearch Meta Tags via Theme Wrappers

Phase 3: Algolia Crawler Configuration

Phase 4: Markdown Index for Ask AI

Phase 5: Docusaurus Algolia Configuration

Phase 6: Front Matter Extensions

Phase 7 (Future): Dynamic Project-Scoped Search

Phase 8 (Future): Display Project Badge in Search Results

Validation Plan

Build-Time Verification

Post-Crawl Verification

Unit Tests

Rollback Plan

Consequences

Positive

Negative

Risks

Alternatives Considered

Multiple Indices (Option B)

URL-Based Faceting Only

Tag-Only Approach

Council Review

References