Skip to main content

ADR-005: Algolia Search and Ask AI Optimization

Status

Accepted (Council Reviewed 2026-01-25)

Context

The amiable.dev site aggregates content from 14 different projects across three content types:

Content TypeURL PatternSource
Projects Page/projectsprojects.json + GitHub API
Root ADRs/docs/adrs/ADR-NNN-*docs/adrs/
Project ADRs/docs/adrs/projects/{repo-name}/ADR-NNN-*Aggregated from project repos
Root Blog/blog/{slug}blog/
Project Blog/blog/projects/{repo-name}/{slug}Aggregated from project repos

With Algolia DocSearch and Ask AI enabled (v1.9.3), search results and AI answers currently mix content from all projects without distinction. Users searching for "authentication" might get results from LLM Council, Arbiter Bot, and SmartBadge ADRs without clear project context.

Problems

  1. Search result ambiguity: Results don't clearly indicate which project they belong to
  2. Ask AI context confusion: AI answers may synthesize information across unrelated projects
  3. No project-scoped search: Users can't limit search to a specific project
  4. Content type mixing: ADRs, blog posts, and project pages are treated equally

Current Project Structure

14 projects in projects.json:

  • arbiter-bot, llm-council, luminescent-cluster, habit-hub, stentorosaur
  • smart-badge, smart-badge-docs, ops, numerai-bot, amiable-templates
  • conductor, midimon, midimon-plugin-registry, amiable-docusaurus

Content counts: 111 ADRs + 42 blog posts aggregated from projects.

Decision

Implement single index with facets architecture using DocSearch meta tags for project and content type attribution.

Architecture: Single Index + Facets

Use one Algolia index with faceted attributes:

  • project: Project slug (e.g., arbiter-bot, llm-council, main)
  • content_type: Content type (docs, blog, projects)

This approach is recommended by Algolia for multi-faceted sites and enables:

  • Cross-project search (default)
  • Project-scoped search via facetFilters
  • Content type filtering
  • Ask AI project boundaries

Implementation

Phase 1: Shared Configuration

Create a centralized project configuration to avoid drift:

src/lib/projects.ts:

// Derive project slugs from projects.json to prevent drift
import projects from '@site/projects.json';

// Extract repo name (e.g., 'amiable-dev/llm-council' → 'llm-council')
export const KNOWN_PROJECTS = new Set(
projects.map((p: { repo: string }) => p.repo.split('/')[1])
);

export function isKnownProject(slug: string): boolean {
return KNOWN_PROJECTS.has(slug);
}

export function inferProjectFromPermalink(permalink: string): string {
// Match /docs/adrs/projects/{project-name}/... or /blog/projects/{project-name}/...
const patterns = [
/\/docs\/adrs\/projects\/([^/]+)/,
/\/blog\/projects\/([^/]+)/,
];

for (const pattern of patterns) {
const match = permalink.match(pattern);
if (match && KNOWN_PROJECTS.has(match[1])) {
return match[1];
}
}
return 'main'; // Site's own content
}

Phase 2: Inject DocSearch Meta Tags via Theme Wrappers

DocSearch crawler automatically picks up docsearch:* meta tags and attaches them as record attributes.

Swizzle DocItem for docs (use --wrap for upgrade safety):

npx docusaurus swizzle @docusaurus/theme-classic DocItem/Layout -- --wrap

src/theme/DocItem/Layout/index.tsx:

import React from 'react';
import Head from '@docusaurus/Head';
import Layout from '@theme-original/DocItem/Layout';
import type LayoutType from '@theme/DocItem/Layout';
import type { WrapperProps } from '@docusaurus/types';
import { useDoc } from '@docusaurus/plugin-content-docs/client';
import { inferProjectFromPermalink } from '@site/src/lib/projects';

type Props = WrapperProps<typeof LayoutType>;

export default function DocItemLayoutWrapper(props: Props): JSX.Element {
const { frontMatter, metadata } = useDoc();

// Priority: front matter > permalink inference
// Use metadata.permalink (stable during SSG) instead of useLocation
const project = (frontMatter as { project?: string }).project
?? inferProjectFromPermalink(metadata.permalink);

return (
<>
<Head>
<meta name="docsearch:project" content={project} />
<meta name="docsearch:content_type" content="docs" />
</Head>
<Layout {...props} />
</>
);
}

Swizzle BlogPostItem for blog:

npx docusaurus swizzle @docusaurus/theme-classic BlogPostItem -- --wrap

src/theme/BlogPostItem/index.tsx:

import React from 'react';
import Head from '@docusaurus/Head';
import BlogPostItem from '@theme-original/BlogPostItem';
import type BlogPostItemType from '@theme/BlogPostItem';
import type { WrapperProps } from '@docusaurus/types';
import { useBlogPost } from '@docusaurus/plugin-content-blog/client';
import { inferProjectFromPermalink, isKnownProject } from '@site/src/lib/projects';

type Props = WrapperProps<typeof BlogPostItemType>;

function inferProjectFromTags(tags: Array<{ label: string }>): string | undefined {
for (const tag of tags) {
if (isKnownProject(tag.label)) {
return tag.label;
}
}
return undefined;
}

export default function BlogPostItemWrapper(props: Props): JSX.Element {
const { frontMatter, metadata } = useBlogPost();

// Priority: front matter > tags > permalink inference
// Use metadata.permalink (stable during SSG) instead of useLocation
const project = (frontMatter as { project?: string }).project
?? inferProjectFromTags(metadata.tags || [])
?? inferProjectFromPermalink(metadata.permalink);

return (
<>
<Head>
<meta name="docsearch:project" content={project} />
<meta name="docsearch:content_type" content="blog" />
</Head>
<BlogPostItem {...props} />
</>
);
}

Projects page (src/pages/projects.tsx):

Add meta tags directly (standalone pages are not covered by swizzled components):

import Head from '@docusaurus/Head';

export default function ProjectsPage(): JSX.Element {
return (
<Layout title="Projects">
<Head>
<meta name="docsearch:project" content="main" />
<meta name="docsearch:content_type" content="projects" />
</Head>
{/* ... rest of component */}
</Layout>
);
}

Phase 3: Algolia Crawler Configuration

Key insight: DocSearch automatically extracts meta tags following the docsearch:$NAME pattern. No explicit recordProps configuration is needed for project and content_type attributes.

From the DocSearch Required Configuration documentation:

"Our crawler automatically extracts information from our DocSearch specific meta tags"

Since we inject <meta name="docsearch:project" content="..."> and <meta name="docsearch:content_type" content="..."> in Phase 2, the crawler will automatically:

  1. Extract the content attribute value
  2. Add project and content_type attributes to all records from that page

Crawler configuration: No changes needed to recordProps. Keep your existing configuration as-is.

Exclusion patterns (add at root level of crawler config, not inside actions):

new Crawler({
appId: "08U6IN17TV",
// ... other root config

// Exclude non-content pages from indexing
exclusionPatterns: [
"https://amiable.dev/status/**", // Dynamic system health data
"https://amiable.dev/blog/tags/**", // Auto-generated tag index pages
"https://amiable.dev/blog/archive/**", // Archive listing page
"https://amiable.dev/docs/tags/**", // Docs tag pages
"https://amiable.dev/search", // Search page itself
],

actions: [
// ... existing actions (no changes needed)
],
});

Note: exclusionPatterns goes at the root level alongside startUrls, sitemaps, etc. - not inside the actions array. See exclusionPatterns documentation.

Index settings (Algolia Dashboard → Index → Configuration):

{
"attributesForFaceting": [
"filterOnly(project)",
"filterOnly(content_type)"
],
"distinct": true,
"attributeForDistinct": "url"
}

Note: Use filterOnly() to prevent facet values from affecting relevance ranking.

Phase 4: Markdown Index for Ask AI

Ask AI works best with a separate markdown-optimized index containing clean, structured text without HTML navigation clutter. See DocSearch Markdown Indexing Guide.

Add a second action to your crawler for the markdown index:

// NEW: Markdown index for Ask AI (add alongside existing DocSearch action)
{
indexName: "amiable-dev-index-markdown",
pathsToMatch: ["https://amiable.dev/**"],
recordExtractor: ({ $, url, helpers }) => {
// Extract clean text, excluding navigation
const text = helpers.markdown(
"main > *:not(nav):not(header):not(.breadcrumb)"
);
if (text === "") return [];

// Extract Docusaurus meta tags (including our custom project tag)
const language = $('meta[name="docsearch:language"]').attr("content") || "en";
const version = $('meta[name="docsearch:version"]').attr("content") || "current";
const docusaurus_tag = $('meta[name="docsearch:docusaurus_tag"]').attr("content") || "";
const project = $('meta[name="docsearch:project"]').attr("content") || "main";
const title = $("head > title").text();
const h1 = $("main h1").first().text();

return helpers.splitTextIntoRecords({
text,
baseRecord: {
url,
objectID: url,
title: title || h1,
heading: h1,
language,
version,
docusaurus_tag,
project,
},
maxRecordBytes: 100000, // Larger = fewer chunks, more context
orderingAttributeName: "part",
});
},
},

Markdown index settings (Algolia Dashboard → Index → Configuration):

{
"attributesForFaceting": ["language", "version", "docusaurus_tag", "project"],
"searchableAttributes": ["title", "heading", "unordered(text)"],
"ignorePlurals": true,
"typoTolerance": false,
"advancedSyntax": false
}

Phase 5: Docusaurus Algolia Configuration

docusaurus.config.js:

themeConfig: {
algolia: {
appId: '08U6IN17TV',
apiKey: 'fc6f1c31ce08f989d8ea7df840eddb5e',
indexName: 'amiable-dev-index', // Main keyword search
contextualSearch: true,
searchPagePath: 'search',
askAi: {
indexName: 'amiable-dev-index-markdown', // Markdown index for AI
apiKey: 'fc6f1c31ce08f989d8ea7df840eddb5e',
appId: '08U6IN17TV',
assistantId: 'aVJcPQwofdBY',
},
},
}

Phase 6: Front Matter Extensions

Update aggregation scripts to inject explicit project field:

scripts/fetch-adrs.js and scripts/fetch-blog-posts.js:

// Add during front matter transformation
frontMatter.project = repoName; // e.g., 'llm-council'

This provides:

  • Explicit project attribution in front matter
  • Override capability for cross-project content
  • Consistency with meta tag injection

For enhanced UX, customize the DocSearch modal to add route-aware filtering:

// Pseudocode for dynamic scoping
const currentProject = inferProjectFromPermalink(location.pathname);

algoliaSearchOptions: {
searchParameters: {
facetFilters: currentProject !== 'main'
? [`project:${currentProject}`]
: [],
},
}

Default behavior by route:

  • On /docs/adrs/projects/llm-council/... → filter to project:llm-council
  • On /blog/projects/arbiter-bot/... → filter to project:arbiter-bot
  • On root pages (/, /projects, /blog) → no filter (global search)

Phase 8 (Future): Display Project Badge in Search Results

To show which project a search result belongs to, customize the DocSearch hit component using hitComponent:

Swizzle the SearchBar or use Algolia theme config:

// In docusaurus.config.js or custom SearchBar component
algolia: {
// ... existing config
// Note: hitComponent customization requires swizzling SearchBar
}

Custom hit component example:

import React from 'react';

function CustomHit({ hit, children }) {
const project = hit.project || 'main';
const showBadge = project !== 'main';

return (
<a href={hit.url} className="DocSearch-Hit-Container">
{showBadge && (
<span className="project-badge" style={{
fontSize: '0.7rem',
padding: '2px 6px',
borderRadius: '4px',
backgroundColor: 'var(--ifm-color-primary-light)',
color: 'white',
marginRight: '8px',
}}>
{project}
</span>
)}
{children}
</a>
);
}

When to implement: Consider implementing when:

  • Users report confusion about which project results come from
  • Search result disambiguation becomes a priority
  • The number of aggregated projects grows significantly

Recommendation: Start without custom hit rendering. The default DocSearch UI shows breadcrumbs (lvl0/lvl1) which already provide context. Monitor user feedback before adding complexity.

Validation Plan

Build-Time Verification

# 1. Build the site
npm run build

# 2. Verify meta tags in generated HTML
grep -r "docsearch:project" build/ | head -20

# 3. Check specific project ADR
cat build/docs/adrs/projects/llm-council/adr-001-council-summary/index.html | grep "docsearch:"

# 4. Check projects page
cat build/projects/index.html | grep "docsearch:"

Post-Crawl Verification

  1. Algolia Dashboard → Index → Browse:

    • Verify records have project and content_type attributes
    • Check that child records (paragraphs/headings) inherit facets
  2. Test faceted queries:

    Query: "architecture"
    Filter: project:llm-council
    Expected: Only LLM Council results
  3. Test Ask AI boundaries:

    • Ask: "How does authentication work in LLM Council?"
    • Verify answer doesn't reference Arbiter Bot or other projects

Unit Tests

__tests__/lib/projects.test.ts:

import { inferProjectFromPermalink, isKnownProject } from '@site/src/lib/projects';

describe('inferProjectFromPermalink', () => {
it('extracts project from docs path', () => {
expect(inferProjectFromPermalink('/docs/adrs/projects/llm-council/adr-001'))
.toBe('llm-council');
});

it('extracts project from blog path', () => {
expect(inferProjectFromPermalink('/blog/projects/arbiter-bot/market-discovery'))
.toBe('arbiter-bot');
});

it('returns main for root docs', () => {
expect(inferProjectFromPermalink('/docs/adrs/ADR-001-blog-homepage'))
.toBe('main');
});

it('returns main for unknown project', () => {
expect(inferProjectFromPermalink('/docs/adrs/projects/unknown-project/adr-001'))
.toBe('main');
});
});

Rollback Plan

If faceted search causes issues:

  1. Revert swizzled components:

    git revert <commit-hash>
    # Or simply delete src/theme/DocItem and src/theme/BlogPostItem
  2. Remove facetFilters from search config (if added)

  3. No crawler changes needed: Facets become unused attributes; existing search continues working

  4. Re-crawl not required: Index continues to function without facet filtering

Consequences

Positive

  • Clear project attribution: Every record tagged with source project
  • Scoped search: Users can filter to specific projects
  • AI context boundaries: Ask AI answers scoped to relevant project
  • Content type filtering: Separate docs, blog, and project page results
  • No index restructuring: Single index, facet-based filtering
  • Incremental adoption: Works with existing content
  • Upgrade safe: Using --wrap swizzle mode

Negative

  • Theme customization: Requires swizzling DocItem and BlogPostItem
  • Crawler dependency: Requires Algolia crawler to extract and propagate meta tags
  • Dynamic scoping: Full project-scoped search UX requires custom DocSearch modal (Phase 6)

Risks

RiskSeverityMitigation
Crawler doesn't propagate meta tags to child recordsHighTest with single page first; contact Algolia support
Standalone pages missing meta tagsHighExplicitly add <Head> to /projects and any other standalone pages
Project list driftMediumDerive from projects.json instead of hardcoding
SSR/hydration mismatchLowUse metadata.permalink instead of useLocation()

Alternatives Considered

Multiple Indices (Option B)

Separate index per project:

  • amiable-dev-main
  • amiable-dev-llm-council
  • amiable-dev-arbiter-bot
  • etc.

Rejected because:

  • Complex crawler configuration (14+ separate crawls)
  • Harder to implement cross-project search (federated search)
  • Quota management across indices
  • Overkill for current content volume (153 aggregated pieces)

URL-Based Faceting Only

Infer project from URL patterns in crawler rules without meta tags.

Rejected because:

  • Fragile if URL structure changes
  • Meta tags are more explicit and self-documenting
  • No front matter override capability
  • Harder to debug (logic in crawler vs visible in HTML)

Tag-Only Approach

Rely entirely on existing tags field in front matter.

Rejected because:

  • Tags are mixed (project + topic)
  • No content type distinction
  • Less semantic than dedicated project attribute

Council Review

Reviewed: 2026-01-25 by LLM Council (high confidence tier)

Models: claude-opus-4.5, gpt-5.2, gemini-3-pro-preview, grok-4.1-fast

Verdict: Approved with modifications

Key recommendations incorporated:

  1. Centralize KNOWN_PROJECTS derived from projects.json
  2. Use metadata.permalink instead of useLocation() for SSR stability
  3. Add explicit crawler configuration section
  4. Add phased implementation plan
  5. Add validation and rollback procedures
  6. Handle standalone /projects page explicitly

References