ADR-005: Algolia Search and Ask AI Optimization
Status
Accepted (Council Reviewed 2026-01-25)
Context
The amiable.dev site aggregates content from 14 different projects across three content types:
| Content Type | URL Pattern | Source |
|---|---|---|
| Projects Page | /projects | projects.json + GitHub API |
| Root ADRs | /docs/adrs/ADR-NNN-* | docs/adrs/ |
| Project ADRs | /docs/adrs/projects/{repo-name}/ADR-NNN-* | Aggregated from project repos |
| Root Blog | /blog/{slug} | blog/ |
| Project Blog | /blog/projects/{repo-name}/{slug} | Aggregated from project repos |
With Algolia DocSearch and Ask AI enabled (v1.9.3), search results and AI answers currently mix content from all projects without distinction. Users searching for "authentication" might get results from LLM Council, Arbiter Bot, and SmartBadge ADRs without clear project context.
Problems
- Search result ambiguity: Results don't clearly indicate which project they belong to
- Ask AI context confusion: AI answers may synthesize information across unrelated projects
- No project-scoped search: Users can't limit search to a specific project
- Content type mixing: ADRs, blog posts, and project pages are treated equally
Current Project Structure
14 projects in projects.json:
arbiter-bot,llm-council,luminescent-cluster,habit-hub,stentorosaursmart-badge,smart-badge-docs,ops,numerai-bot,amiable-templatesconductor,midimon,midimon-plugin-registry,amiable-docusaurus
Content counts: 111 ADRs + 42 blog posts aggregated from projects.
Decision
Implement single index with facets architecture using DocSearch meta tags for project and content type attribution.
Architecture: Single Index + Facets
Use one Algolia index with faceted attributes:
project: Project slug (e.g.,arbiter-bot,llm-council,main)content_type: Content type (docs,blog,projects)
This approach is recommended by Algolia for multi-faceted sites and enables:
- Cross-project search (default)
- Project-scoped search via
facetFilters - Content type filtering
- Ask AI project boundaries
Implementation
Phase 1: Shared Configuration
Create a centralized project configuration to avoid drift:
src/lib/projects.ts:
// Derive project slugs from projects.json to prevent drift
import projects from '@site/projects.json';
// Extract repo name (e.g., 'amiable-dev/llm-council' → 'llm-council')
export const KNOWN_PROJECTS = new Set(
projects.map((p: { repo: string }) => p.repo.split('/')[1])
);
export function isKnownProject(slug: string): boolean {
return KNOWN_PROJECTS.has(slug);
}
export function inferProjectFromPermalink(permalink: string): string {
// Match /docs/adrs/projects/{project-name}/... or /blog/projects/{project-name}/...
const patterns = [
/\/docs\/adrs\/projects\/([^/]+)/,
/\/blog\/projects\/([^/]+)/,
];
for (const pattern of patterns) {
const match = permalink.match(pattern);
if (match && KNOWN_PROJECTS.has(match[1])) {
return match[1];
}
}
return 'main'; // Site's own content
}
Phase 2: Inject DocSearch Meta Tags via Theme Wrappers
DocSearch crawler automatically picks up docsearch:* meta tags and attaches them as record attributes.
Swizzle DocItem for docs (use --wrap for upgrade safety):
npx docusaurus swizzle @docusaurus/theme-classic DocItem/Layout -- --wrap
src/theme/DocItem/Layout/index.tsx:
import React from 'react';
import Head from '@docusaurus/Head';
import Layout from '@theme-original/DocItem/Layout';
import type LayoutType from '@theme/DocItem/Layout';
import type { WrapperProps } from '@docusaurus/types';
import { useDoc } from '@docusaurus/plugin-content-docs/client';
import { inferProjectFromPermalink } from '@site/src/lib/projects';
type Props = WrapperProps<typeof LayoutType>;
export default function DocItemLayoutWrapper(props: Props): JSX.Element {
const { frontMatter, metadata } = useDoc();
// Priority: front matter > permalink inference
// Use metadata.permalink (stable during SSG) instead of useLocation
const project = (frontMatter as { project?: string }).project
?? inferProjectFromPermalink(metadata.permalink);
return (
<>
<Head>
<meta name="docsearch:project" content={project} />
<meta name="docsearch:content_type" content="docs" />
</Head>
<Layout {...props} />
</>
);
}
Swizzle BlogPostItem for blog:
npx docusaurus swizzle @docusaurus/theme-classic BlogPostItem -- --wrap
src/theme/BlogPostItem/index.tsx:
import React from 'react';
import Head from '@docusaurus/Head';
import BlogPostItem from '@theme-original/BlogPostItem';
import type BlogPostItemType from '@theme/BlogPostItem';
import type { WrapperProps } from '@docusaurus/types';
import { useBlogPost } from '@docusaurus/plugin-content-blog/client';
import { inferProjectFromPermalink, isKnownProject } from '@site/src/lib/projects';
type Props = WrapperProps<typeof BlogPostItemType>;
function inferProjectFromTags(tags: Array<{ label: string }>): string | undefined {
for (const tag of tags) {
if (isKnownProject(tag.label)) {
return tag.label;
}
}
return undefined;
}
export default function BlogPostItemWrapper(props: Props): JSX.Element {
const { frontMatter, metadata } = useBlogPost();
// Priority: front matter > tags > permalink inference
// Use metadata.permalink (stable during SSG) instead of useLocation
const project = (frontMatter as { project?: string }).project
?? inferProjectFromTags(metadata.tags || [])
?? inferProjectFromPermalink(metadata.permalink);
return (
<>
<Head>
<meta name="docsearch:project" content={project} />
<meta name="docsearch:content_type" content="blog" />
</Head>
<BlogPostItem {...props} />
</>
);
}
Projects page (src/pages/projects.tsx):
Add meta tags directly (standalone pages are not covered by swizzled components):
import Head from '@docusaurus/Head';
export default function ProjectsPage(): JSX.Element {
return (
<Layout title="Projects">
<Head>
<meta name="docsearch:project" content="main" />
<meta name="docsearch:content_type" content="projects" />
</Head>
{/* ... rest of component */}
</Layout>
);
}
Phase 3: Algolia Crawler Configuration
Key insight: DocSearch automatically extracts meta tags following the docsearch:$NAME pattern. No explicit recordProps configuration is needed for project and content_type attributes.
From the DocSearch Required Configuration documentation:
"Our crawler automatically extracts information from our DocSearch specific meta tags"
Since we inject <meta name="docsearch:project" content="..."> and <meta name="docsearch:content_type" content="..."> in Phase 2, the crawler will automatically:
- Extract the
contentattribute value - Add
projectandcontent_typeattributes to all records from that page
Crawler configuration: No changes needed to recordProps. Keep your existing configuration as-is.
Exclusion patterns (add at root level of crawler config, not inside actions):
new Crawler({
appId: "08U6IN17TV",
// ... other root config
// Exclude non-content pages from indexing
exclusionPatterns: [
"https://amiable.dev/status/**", // Dynamic system health data
"https://amiable.dev/blog/tags/**", // Auto-generated tag index pages
"https://amiable.dev/blog/archive/**", // Archive listing page
"https://amiable.dev/docs/tags/**", // Docs tag pages
"https://amiable.dev/search", // Search page itself
],
actions: [
// ... existing actions (no changes needed)
],
});
Note:
exclusionPatternsgoes at the root level alongsidestartUrls,sitemaps, etc. - not inside theactionsarray. See exclusionPatterns documentation.
Index settings (Algolia Dashboard → Index → Configuration):
{
"attributesForFaceting": [
"filterOnly(project)",
"filterOnly(content_type)"
],
"distinct": true,
"attributeForDistinct": "url"
}
Note: Use
filterOnly()to prevent facet values from affecting relevance ranking.
Phase 4: Markdown Index for Ask AI
Ask AI works best with a separate markdown-optimized index containing clean, structured text without HTML navigation clutter. See DocSearch Markdown Indexing Guide.
Add a second action to your crawler for the markdown index:
// NEW: Markdown index for Ask AI (add alongside existing DocSearch action)
{
indexName: "amiable-dev-index-markdown",
pathsToMatch: ["https://amiable.dev/**"],
recordExtractor: ({ $, url, helpers }) => {
// Extract clean text, excluding navigation
const text = helpers.markdown(
"main > *:not(nav):not(header):not(.breadcrumb)"
);
if (text === "") return [];
// Extract Docusaurus meta tags (including our custom project tag)
const language = $('meta[name="docsearch:language"]').attr("content") || "en";
const version = $('meta[name="docsearch:version"]').attr("content") || "current";
const docusaurus_tag = $('meta[name="docsearch:docusaurus_tag"]').attr("content") || "";
const project = $('meta[name="docsearch:project"]').attr("content") || "main";
const title = $("head > title").text();
const h1 = $("main h1").first().text();
return helpers.splitTextIntoRecords({
text,
baseRecord: {
url,
objectID: url,
title: title || h1,
heading: h1,
language,
version,
docusaurus_tag,
project,
},
maxRecordBytes: 100000, // Larger = fewer chunks, more context
orderingAttributeName: "part",
});
},
},
Markdown index settings (Algolia Dashboard → Index → Configuration):
{
"attributesForFaceting": ["language", "version", "docusaurus_tag", "project"],
"searchableAttributes": ["title", "heading", "unordered(text)"],
"ignorePlurals": true,
"typoTolerance": false,
"advancedSyntax": false
}
Phase 5: Docusaurus Algolia Configuration
docusaurus.config.js:
themeConfig: {
algolia: {
appId: '08U6IN17TV',
apiKey: 'fc6f1c31ce08f989d8ea7df840eddb5e',
indexName: 'amiable-dev-index', // Main keyword search
contextualSearch: true,
searchPagePath: 'search',
askAi: {
indexName: 'amiable-dev-index-markdown', // Markdown index for AI
apiKey: 'fc6f1c31ce08f989d8ea7df840eddb5e',
appId: '08U6IN17TV',
assistantId: 'aVJcPQwofdBY',
},
},
}
Phase 6: Front Matter Extensions
Update aggregation scripts to inject explicit project field:
scripts/fetch-adrs.js and scripts/fetch-blog-posts.js:
// Add during front matter transformation
frontMatter.project = repoName; // e.g., 'llm-council'
This provides:
- Explicit project attribution in front matter
- Override capability for cross-project content
- Consistency with meta tag injection
Phase 7 (Future): Dynamic Project-Scoped Search
For enhanced UX, customize the DocSearch modal to add route-aware filtering:
// Pseudocode for dynamic scoping
const currentProject = inferProjectFromPermalink(location.pathname);
algoliaSearchOptions: {
searchParameters: {
facetFilters: currentProject !== 'main'
? [`project:${currentProject}`]
: [],
},
}
Default behavior by route:
- On
/docs/adrs/projects/llm-council/...→ filter toproject:llm-council - On
/blog/projects/arbiter-bot/...→ filter toproject:arbiter-bot - On root pages (
/,/projects,/blog) → no filter (global search)
Phase 8 (Future): Display Project Badge in Search Results
To show which project a search result belongs to, customize the DocSearch hit component using hitComponent:
Swizzle the SearchBar or use Algolia theme config:
// In docusaurus.config.js or custom SearchBar component
algolia: {
// ... existing config
// Note: hitComponent customization requires swizzling SearchBar
}
Custom hit component example:
import React from 'react';
function CustomHit({ hit, children }) {
const project = hit.project || 'main';
const showBadge = project !== 'main';
return (
<a href={hit.url} className="DocSearch-Hit-Container">
{showBadge && (
<span className="project-badge" style={{
fontSize: '0.7rem',
padding: '2px 6px',
borderRadius: '4px',
backgroundColor: 'var(--ifm-color-primary-light)',
color: 'white',
marginRight: '8px',
}}>
{project}
</span>
)}
{children}
</a>
);
}
When to implement: Consider implementing when:
- Users report confusion about which project results come from
- Search result disambiguation becomes a priority
- The number of aggregated projects grows significantly
Recommendation: Start without custom hit rendering. The default DocSearch UI shows breadcrumbs (lvl0/lvl1) which already provide context. Monitor user feedback before adding complexity.
Validation Plan
Build-Time Verification
# 1. Build the site
npm run build
# 2. Verify meta tags in generated HTML
grep -r "docsearch:project" build/ | head -20
# 3. Check specific project ADR
cat build/docs/adrs/projects/llm-council/adr-001-council-summary/index.html | grep "docsearch:"
# 4. Check projects page
cat build/projects/index.html | grep "docsearch:"
Post-Crawl Verification
-
Algolia Dashboard → Index → Browse:
- Verify records have
projectandcontent_typeattributes - Check that child records (paragraphs/headings) inherit facets
- Verify records have
-
Test faceted queries:
Query: "architecture"
Filter: project:llm-council
Expected: Only LLM Council results -
Test Ask AI boundaries:
- Ask: "How does authentication work in LLM Council?"
- Verify answer doesn't reference Arbiter Bot or other projects
Unit Tests
__tests__/lib/projects.test.ts:
import { inferProjectFromPermalink, isKnownProject } from '@site/src/lib/projects';
describe('inferProjectFromPermalink', () => {
it('extracts project from docs path', () => {
expect(inferProjectFromPermalink('/docs/adrs/projects/llm-council/adr-001'))
.toBe('llm-council');
});
it('extracts project from blog path', () => {
expect(inferProjectFromPermalink('/blog/projects/arbiter-bot/market-discovery'))
.toBe('arbiter-bot');
});
it('returns main for root docs', () => {
expect(inferProjectFromPermalink('/docs/adrs/ADR-001-blog-homepage'))
.toBe('main');
});
it('returns main for unknown project', () => {
expect(inferProjectFromPermalink('/docs/adrs/projects/unknown-project/adr-001'))
.toBe('main');
});
});
Rollback Plan
If faceted search causes issues:
-
Revert swizzled components:
git revert <commit-hash>
# Or simply delete src/theme/DocItem and src/theme/BlogPostItem -
Remove facetFilters from search config (if added)
-
No crawler changes needed: Facets become unused attributes; existing search continues working
-
Re-crawl not required: Index continues to function without facet filtering
Consequences
Positive
- Clear project attribution: Every record tagged with source project
- Scoped search: Users can filter to specific projects
- AI context boundaries: Ask AI answers scoped to relevant project
- Content type filtering: Separate docs, blog, and project page results
- No index restructuring: Single index, facet-based filtering
- Incremental adoption: Works with existing content
- Upgrade safe: Using
--wrapswizzle mode
Negative
- Theme customization: Requires swizzling DocItem and BlogPostItem
- Crawler dependency: Requires Algolia crawler to extract and propagate meta tags
- Dynamic scoping: Full project-scoped search UX requires custom DocSearch modal (Phase 6)
Risks
| Risk | Severity | Mitigation |
|---|---|---|
| Crawler doesn't propagate meta tags to child records | High | Test with single page first; contact Algolia support |
| Standalone pages missing meta tags | High | Explicitly add <Head> to /projects and any other standalone pages |
| Project list drift | Medium | Derive from projects.json instead of hardcoding |
| SSR/hydration mismatch | Low | Use metadata.permalink instead of useLocation() |
Alternatives Considered
Multiple Indices (Option B)
Separate index per project:
amiable-dev-mainamiable-dev-llm-councilamiable-dev-arbiter-bot- etc.
Rejected because:
- Complex crawler configuration (14+ separate crawls)
- Harder to implement cross-project search (federated search)
- Quota management across indices
- Overkill for current content volume (153 aggregated pieces)
URL-Based Faceting Only
Infer project from URL patterns in crawler rules without meta tags.
Rejected because:
- Fragile if URL structure changes
- Meta tags are more explicit and self-documenting
- No front matter override capability
- Harder to debug (logic in crawler vs visible in HTML)
Tag-Only Approach
Rely entirely on existing tags field in front matter.
Rejected because:
- Tags are mixed (project + topic)
- No content type distinction
- Less semantic than dedicated
projectattribute
Council Review
Reviewed: 2026-01-25 by LLM Council (high confidence tier)
Models: claude-opus-4.5, gpt-5.2, gemini-3-pro-preview, grok-4.1-fast
Verdict: Approved with modifications
Key recommendations incorporated:
- Centralize
KNOWN_PROJECTSderived fromprojects.json - Use
metadata.permalinkinstead ofuseLocation()for SSR stability - Add explicit crawler configuration section
- Add phased implementation plan
- Add validation and rollback procedures
- Handle standalone
/projectspage explicitly
References
- Algolia DocSearch Facet Filters
- DocSearch Meta Tags
- Docusaurus Algolia Configuration
- Algolia Index Settings
- ADR-003: GitHub Projects Showcase (project structure)
- ADR-004: Remote Blog Post Aggregation (blog aggregation patterns)