ADR-018: Knowledge Layer & Retrieval Architecture

Status

Proposed (Draft)

Relationship to other ADRs:

ADR-007 (LLM Integration Architecture): Defines the provider abstraction, MCP tools, Skills/SKILL.md, and the system prompt structure this ADR extends.

ADR-015 (LLM Signal Awareness): Defines the T1/T2/T3 signal context injection that provides runtime awareness. This ADR addresses a complementary concern — domain knowledge rather than runtime state.

ADR-013 (LLM Canvas): Artifact types like markdown may render knowledge-layer content (e.g., protocol guides, mapping tutorials).

Context

Problem Statement

Conductor supports multiple LLM providers (OpenAI, Anthropic, Google, OpenRouter, LiteLLM) with varying degrees of baseline knowledge. A user running Claude Sonnet, GPT-4, Gemini Pro, or a local model via LiteLLM will get meaningfully different results for the same mapping request — not because the models differ in reasoning ability, but because they differ in what they know.

The knowledge gaps fall into four categories, each with different characteristics:

Category 1 — Conductor Internals: No public LLM has been trained on Conductor's configuration schema, trigger vocabulary, action types, mode semantics, velocity curve formats, conditional logic syntax, or device binding structure. This knowledge is entirely proprietary. Every model is equally blind here. The SKILL.md system (ADR-007) partially addresses this by injecting domain expertise into the prompt, but Skills are prose-oriented ("how to think about velocity curves") rather than reference-oriented ("the exact TOML schema for a velocity curve mapping").

Category 2 — Protocol Specifications: MIDI 1.0 basics (Note On/Off, CC, Program Change) are well-represented in most models' training data. But coverage degrades rapidly for: MIDI 2.0 UMP (Universal MIDI Packet) format, MIDI-CI (Capability Inquiry), NRPN/RPN encoding, SysEx manufacturer IDs, OSC type tags and bundle semantics, ArtNet universe addressing and DMX channel layouts, HID usage tables (especially for non-standard controllers), and Mackie Control / HUI protocol specifics. A user asking "set up my controller's NRPN for filter cutoff" will get confident-but-wrong answers from most models.

Category 3 — Platform Automation: Conductor's action system can trigger OS-level automation — keystrokes, mouse events, application commands, shell execution. The LLM needs platform-specific knowledge to suggest correct actions: AppleScript/JXA on macOS, PowerShell/UI Automation on Windows, D-Bus/xdotool/xdg on Linux. Models have decent baseline coverage for common patterns but are unreliable for edge cases (e.g., scripting specific DAW automation, addressing particular application windows, accessibility API patterns).

Category 4 — Device Profiles: The long tail. Each MIDI controller, gamepad, or HID device has its own CC assignments, note layouts, button grids, encoder behaviour, LED/display feedback protocols, and SysEx commands. Conductor's device descriptor files capture detection signatures but not operational knowledge ("the Mikro MK3's pads send Note On ch.10 with velocity, the knobs send CC 70-77 ch.1"). This knowledge currently lives in the user's head or in manufacturer documentation the LLM hasn't seen.

Why Not Just Larger Context Windows?

As context windows grow (128K → 1M+), it's tempting to prepend large reference documents to the system prompt. This fails for three reasons:

Token cost scales linearly with users. Every API call pays for every injected token. A 20K-token protocol reference multiplied by thousands of daily conversations is economically unsustainable — particularly when 90% of conversations don't need protocol details.
Attention degrades with distance. LLMs retrieve information less reliably from the middle of long contexts ("lost in the middle" effect). A 50K-token system prompt means the model is less likely to use any individual fact correctly than if the same fact appeared in a 2K-token retrieval result placed near the user's query.
Freshness. Protocol specs update (MIDI 2.0 has had three revisions since 2020). Device profiles are released continuously. Static system prompts fossilise knowledge at deploy time.

Why Not Just More Skills?

The SKILL.md system (ADR-007) is the right mechanism for expertise — teaching the LLM how to reason about Conductor's domain. But Skills are:

Coarse-grained. A skill is loaded in its entirety when triggered. You can't load "just the NRPN section" of a MIDI skill.
Author-curated. Someone must write and maintain each skill. This doesn't scale to 500 device profiles or the full MIDI 2.0 spec.
Prompt-budget constrained. Each loaded skill consumes context. Loading 5 skills simultaneously competes with conversation history, T1/T2/T3 signal context, and tool schemas for the same context window.

Skills and retrieval are complementary: Skills provide the reasoning framework ("when a user asks about velocity sensitivity, think about curve shapes, dead zones, and the difference between note velocity and aftertouch pressure"), while retrieval provides the reference facts ("CC74 is the standard assignment for brightness/timbre on General MIDI").

Design Exploration

Retrieval-Augmented Generation (RAG) in this context means: given a user query and conversation state, select relevant knowledge chunks and inject them into the LLM context alongside the query. The retrieval is transparent to the LLM — it sees additional context, not a special tool.

The alternatives considered:

Approach	Pros	Cons
Static system prompt	Simple, deterministic, no retrieval latency	Token-expensive, doesn't scale, stale
Skills only	Curated quality, integrated with ADR-007	Coarse-grained, manual authoring, prompt budget
Tool-based retrieval (LLM calls a `search_docs` tool)	LLM controls when to search	Adds a tool call round-trip, LLM may not know to search, inconsistent across providers
Automatic RAG (retrieval before LLM call)	Transparent, no extra round-trip, works with all providers	Requires embedding infrastructure, relevance judgement is imperfect
Hybrid: static + automatic RAG	Best of both — guaranteed baseline + dynamic enrichment	More complex, two systems to maintain

Decision

D1: Three-Layer Knowledge Architecture

Knowledge is organized into three layers with different delivery mechanisms, reflecting the tradeoff between reliability and scalability:

Layer	Name	Content	Delivery	Size	Update Cadence
L1	Core Reference	Conductor schema, trigger/action vocabulary, config grammar, mode semantics, tool usage patterns	Static system prompt injection	~2-4K tokens	Per release
L2	Domain Index	Protocol specs (MIDI, OSC, ArtNet, HID), platform automation APIs, curated device profiles	Automatic RAG — retrieved per query	10-50K tokens indexed, 500-1500 tokens retrieved per query	Periodic updates (quarterly or on spec revision)
L3	Community & Live	User-contributed device profiles, community mappings, online documentation	Optional online retrieval (user opt-in)	Unbounded	Continuous

Rationale: L1 is always present because the LLM cannot function without Conductor-specific vocabulary — this is the cost of enabling the tool. L2 is retrieved on-demand because protocol knowledge is large but only selectively relevant — a question about velocity curves doesn't need ArtNet knowledge. L3 is optional because it requires network access and introduces trust/freshness concerns that should be under user control.

D2: L1 — Core Reference Document

A structured reference document ships with Conductor and is injected into the system prompt on every LLM call. This replaces the current approach of relying solely on Skills and tool schemas to communicate Conductor's domain.

The reference document covers:

Configuration Schema Reference — TOML structure for devices, modes, mappings, profiles, with annotated examples
Trigger Type Catalogue — every trigger variant (NoteOn, CC, DoubleTap, Chord, LongPress, AftertouchZone, PitchBendZone, CompoundTrigger, Gamepad*) with parameters and valid ranges
Action Type Catalogue — every action variant (Keystroke, MouseClick, SendMidi, OscSend, ShellCommand, LaunchApp, Delay, Sequence, Conditional) with parameters
Velocity Curve Types — Fixed, PassThrough, Linear, Curve (Exponential, Logarithmic, S-Curve) with parameter ranges
Mode Semantics — mode switching, fallback chains, activation conditions
Common Patterns — 10-15 annotated mapping examples covering typical use cases

Format: Markdown, compiled into a single string at build time. Structured with headers so the content is scannable by both humans (for maintenance) and LLMs (for attention).

Token budget: Target 2,500 tokens. Hard cap at 4,000 tokens. This budget is firm — it shares the system prompt with T1 topology (~200 tokens), tool schemas (~2,000 tokens for 20 MCP tools), and any loaded Skills.

Injection point: Appended to the system prompt in _buildMessages() (chat.js frontend) or build_system_prompt() (Rust backend, depending on where system prompt assembly lives at implementation time). Placed after the base system prompt and before T1 topology, so the LLM sees: identity → domain reference → current topology → conversation.

Maintenance: The reference document lives in docs/llm-reference.md in the repository. A build-time script validates it fits within the token budget (using the same character-based estimation from context.rs: chars * 0.25 + overhead). CI fails if the reference exceeds 4,000 estimated tokens.

System Prompt Assembly Order:
┌─────────────────────────────────┐
│ Base Identity & Behaviour       │  ~200 tokens
├─────────────────────────────────┤
│ L1: Core Reference Document     │  ~2,500 tokens (cap: 4,000)
├─────────────────────────────────┤
│ Loaded Skills (0-3)             │  ~1,000-5,000 tokens
├─────────────────────────────────┤
│ T1: Structural Topology         │  ~150-250 tokens (ADR-015)
├─────────────────────────────────┤
│ MCP Tool Schemas                │  ~2,000 tokens
├─────────────────────────────────┤
│ T2: Signal Pulse (injected)     │  ~100-300 tokens (ADR-015)
├─────────────────────────────────┤
│ Artifact Context (if edits)     │  ~100-500 tokens (ADR-013)
├─────────────────────────────────┤
│ Retrieved L2 Chunks (if any)    │  ~500-1,500 tokens
└─────────────────────────────────┘
Total system budget: ~7,000-14,000 tokens (well within 128K window)

D3: L2 — Local Retrieval Index

A vector similarity index ships with Conductor (or downloads on first launch) containing chunked, embedded domain knowledge. On each user message, the retrieval layer queries this index and injects relevant chunks into the LLM context.

D3.1: Index Contents

The L2 index covers four domains:

Domain	Source Material	Chunk Count (est.)	Update Frequency
MIDI	MIDI 1.0 Detailed Spec, MIDI 2.0 UMP Spec, MIDI-CI, GM/GM2 controller assignments, SysEx manufacturer IDs	200-300 chunks	On spec revision
OSC	OSC 1.0/1.1 spec, common address patterns, type tag reference	40-60 chunks	Rare
ArtNet	ArtNet 4 spec, universe/subnet addressing, DMX channel layouts	60-80 chunks	On spec revision
HID	USB HID usage tables (generic desktop, game controllers, keyboard), descriptor format	80-120 chunks	On spec revision
Platform Automation	macOS (AppleScript/JXA patterns, Accessibility API), Windows (PowerShell, SendKeys, UI Automation), Linux (xdotool, D-Bus, xdg)	100-150 chunks	Per OS release
Device Profiles	Curated profiles for 20-30 common controllers (Akai, Novation, Korg, Native Instruments, Arturia)	60-90 chunks	As devices are profiled

Total estimated chunks: 540-800, each 200-500 tokens. Total index size: ~2-5 MB on disk (embeddings + metadata).

D3.2: Chunking Strategy

Documents are chunked using a section-aware splitter that respects document structure:

Split on section boundaries (headers, numbered clauses in specs)
Target chunk size: 300 tokens (200-500 range)
Each chunk retains: source document, section path (e.g., "MIDI 1.0 > Channel Voice Messages > Control Change"), domain tag, chunk ID
Overlapping context: 50-token overlap between adjacent chunks to preserve cross-boundary information

Chunks are stored as:

struct KnowledgeChunk {
    id: String,
    domain: KnowledgeDomain,         // Midi, Osc, ArtNet, Hid, Platform, Device
    source: String,                   // "midi-1.0-spec", "akai-mpk-mini-profile"
    section_path: Vec<String>,        // ["Channel Voice Messages", "Control Change"]
    content: String,                  // The actual text
    embedding: Vec<f32>,              // Dense vector (384-dim or 768-dim)
    metadata: ChunkMetadata,          // version, last_updated, relevance_tags
}

enum KnowledgeDomain {
    Midi,
    Osc,
    ArtNet,
    Hid,
    Platform(PlatformTarget),
    Device(String),                   // Device identifier
}

enum PlatformTarget {
    MacOS,
    Windows,
    Linux,
}

D3.3: Embedding Model

The embedding model runs locally to avoid API dependencies and preserve privacy. Requirements:

Inference: must run on CPU in <100ms per query (single embedding)
Dimensions: 384 recommended (balances quality vs. index size)
Model: all-MiniLM-L6-v2 (via ONNX Runtime) as the default — 80MB model, 384-dim, well-benchmarked on technical content
Alternative: bge-small-en-v1.5 (33MB, 384-dim) for smaller install footprint

The embedding model is bundled as an ONNX file in the Conductor distribution. The Rust backend loads it via ort (ONNX Runtime for Rust) at startup.

D3.4: Index Storage

The index is stored in SQLite using a single table with a vector similarity extension:

CREATE TABLE knowledge_chunks (
    id TEXT PRIMARY KEY,
    domain TEXT NOT NULL,
    source TEXT NOT NULL,
    section_path TEXT NOT NULL,         -- JSON array
    content TEXT NOT NULL,
    embedding BLOB NOT NULL,            -- f32 array, little-endian
    metadata TEXT NOT NULL,             -- JSON
    created_at TEXT NOT NULL,
    updated_at TEXT NOT NULL
);

CREATE INDEX idx_chunks_domain ON knowledge_chunks(domain);
CREATE INDEX idx_chunks_source ON knowledge_chunks(source);

Vector similarity search uses brute-force cosine similarity over the BLOB embeddings. At 500-800 chunks, brute-force search completes in <5ms — no approximate nearest neighbor index is needed at this scale. If the index grows beyond 5,000 chunks (L3 community content), switch to an ANN structure (HNSW via sqlite-vss or a dedicated vector library).

Database location: $CONDUCTOR_DATA_DIR/knowledge/knowledge.db alongside knowledge.onnx (embedding model).

D3.5: Retrieval Pipeline

On each user message, before the LLM API call:

User Message
     │
     ▼
┌─────────────────────────┐
│  1. Query Formation      │  Combine: user message + last assistant message
│                          │  (provides conversational context for better retrieval)
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│  2. Domain Filter        │  Heuristic pre-filter based on keywords:
│                          │  "CC", "note", "velocity" → Midi
│                          │  "OSC", "address" → Osc
│                          │  "DMX", "universe" → ArtNet
│                          │  "gamepad", "HID" → Hid
│                          │  "AppleScript", "keystroke" → Platform
│                          │  Device names → Device
│                          │  No match → search all domains
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│  3. Embed Query          │  ONNX model → 384-dim vector
│                          │  Latency: <50ms on CPU
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│  4. Similarity Search    │  Cosine similarity against filtered chunks
│                          │  Return top-K (K=5)
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│  5. Relevance Threshold  │  Discard chunks with similarity < 0.35
│                          │  (prevents injection of irrelevant content)
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│  6. Budget Enforcement   │  Total retrieved tokens ≤ 1,500
│                          │  Trim lowest-ranked chunks if over budget
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│  7. Context Injection    │  Format as "## Reference Context" section
│                          │  Inject after L1 reference, before conversation
└─────────────────────────┘

Retrieval is transparent to the LLM. The retrieved chunks appear as part of the system context, not as a tool call result. This ensures consistent behaviour across all providers and avoids the problem of the LLM needing to decide when to search.

Injection format:

## Reference Context

The following reference information may be relevant to the current question.

### MIDI 1.0 — Control Change Messages
Control Change messages (status byte 0xBn) carry a controller number (0-127)
and value (0-127). Common assignments include:
- CC1: Modulation Wheel
- CC7: Channel Volume
- CC10: Pan
- CC64: Sustain Pedal (Damper)
- CC74: Brightness (Sound Controller 5)
[Source: MIDI 1.0 Detailed Specification, §4.2.1]

### Akai MPK Mini MK3 — Control Layout
Pads: 8 pads sending Note On/Off on ch.10 (default), velocity sensitive.
Knobs: 8 knobs sending CC 70-77 on ch.1 (default bank A).
[Source: akai-mpk-mini-mk3 device profile]

D4: L3 — Optional Online Retrieval

L3 extends the knowledge layer with content that cannot ship locally: community-contributed device profiles, manufacturer documentation, and emerging protocol extensions. L3 is disabled by default and requires explicit user opt-in.

D4.1: Online Sources

Source	Content	Trust Level
Conductor Hub (future)	Community device profiles, shared mapping templates	Curated — reviewed before indexing
Manufacturer docs	PDF manuals, MIDI implementation charts	High — authoritative but may require parsing
Community forums	MIDI mapping tips, DAW-specific automation	Low — useful but unverified

D4.2: Retrieval Mechanism

When L3 is enabled, the retrieval pipeline adds a step after L2 local search:

If L2 returns fewer than 2 chunks above threshold, query the Conductor Hub API
Hub API accepts the same embedding vector and returns up to 3 chunks
Hub chunks are marked with source: "community" and injected with a provenance note: "[Community-contributed — verify before use]"
Hub results are cached locally for 24 hours to reduce API calls

D4.3: Privacy

L3 sends the embedding vector of the query to the Hub, not the raw query text. Embedding vectors cannot be trivially reversed to recover the original text.
The user can inspect what was sent in the audit log (ADR-007 Phase 4).
L3 is off by default. Enabling it requires navigating to Settings → LLM → Knowledge Sources and toggling "Community Knowledge."

D5: Retrieval Integration Points

The retrieval layer integrates with the existing Rust backend at two points:

D5.1: System Prompt Assembly (Rust-side)

The build_system_prompt() function (or its equivalent) is extended to call the retrieval pipeline:

pub async fn build_system_prompt(
    &self,
    user_message: &str,
    conversation: &[ChatMessage],
    config: &AppConfig,
) -> String {
    let mut prompt = String::new();

    // Base identity
    prompt.push_str(&self.base_identity);

    // L1: Core Reference (always present)
    prompt.push_str(&self.core_reference);

    // Loaded Skills
    for skill in &self.active_skills {
        prompt.push_str(&skill.content);
    }

    // T1: Structural Topology (ADR-015)
    prompt.push_str(&self.build_topology_summary());

    // L2 + L3: Retrieved knowledge (query-dependent)
    let retrieved = self.knowledge_index
        .retrieve(user_message, conversation.last(), config.knowledge_settings())
        .await;
    if !retrieved.chunks.is_empty() {
        prompt.push_str("\n\n## Reference Context\n\n");
        prompt.push_str(&retrieved.format_for_injection());
    }

    prompt
}

D5.2: Tauri Command for Index Management

Expose index management via Tauri commands for the Settings UI:

#[tauri::command]
async fn knowledge_get_status(state: State<'_, KnowledgeIndex>) -> Result<KnowledgeStatus, String>;

#[tauri::command]
async fn knowledge_get_sources(state: State<'_, KnowledgeIndex>) -> Result<Vec<KnowledgeSource>, String>;

#[tauri::command]
async fn knowledge_toggle_source(
    state: State<'_, KnowledgeIndex>,
    source_id: String,
    enabled: bool,
) -> Result<(), String>;

#[tauri::command]
async fn knowledge_update_index(state: State<'_, KnowledgeIndex>) -> Result<UpdateResult, String>;

#[tauri::command]
async fn knowledge_add_device_profile(
    state: State<'_, KnowledgeIndex>,
    profile: DeviceProfile,
) -> Result<(), String>;

D6: Provider-Aware Retrieval Tuning

Different LLM providers have different baseline knowledge. The retrieval layer can optionally adjust its behaviour based on the configured provider:

Provider	Baseline Protocol Knowledge	Retrieval Adjustment
Claude (Anthropic)	Strong on MIDI basics, moderate OSC, weak ArtNet/HID	Boost ArtNet/HID retrieval weight
GPT-4 (OpenAI)	Strong on MIDI basics, strong platform automation, weak protocol edge cases	Boost NRPN/SysEx/MIDI 2.0 retrieval
Gemini (Google)	Moderate across all domains	No adjustment (baseline)
Local models (LiteLLM)	Highly variable, often weak across all domains	Lower relevance threshold (0.25 vs 0.35), increase K to 7

This is implemented as a ProviderProfile that adjusts retrieval parameters:

struct ProviderProfile {
    relevance_threshold: f32,
    max_chunks: usize,
    max_tokens: usize,
    domain_boosts: HashMap<KnowledgeDomain, f32>,  // Multiplier on similarity score
}

Provider profiles are best-effort heuristics, not hard rules. They default to the baseline profile and can be overridden in settings.

D7: Knowledge Source Management UI

A new section in App Settings → LLM: Knowledge Sources.

Element	Description
Index Status	"540 chunks indexed across 6 domains" / "Index not built"
Source List	Toggle-able list: MIDI Spec ✓, OSC Spec ✓, ArtNet Spec ✓, HID Tables ✓, Platform (macOS) ✓, Device Profiles ✓
Community Knowledge	Toggle (default off) with explanation: "When enabled, queries the Conductor Hub for community-contributed device profiles and mapping patterns. Sends anonymised search vectors only."
Update Index	Button to re-index from bundled sources. Shows last update date.
Add Device Profile	Import a device profile (JSON/TOML) to add to the local index
Index Size	"2.3 MB on disk"

D8: Interaction with Skills System

L1/L2/L3 retrieval and Skills (ADR-007) are complementary and can coexist in the same prompt. Their interaction is:

Skills fire first. Skill loading is determined by the user's intent (detected from the message or manually triggered). Skills provide reasoning frameworks and workflow guidance.
Retrieval supplements. L2 retrieval runs on every message regardless of skill state. If a MIDI skill is loaded AND L2 retrieves MIDI protocol chunks, both appear in context — the skill provides "how to think" and the retrieval provides "specific facts."
Budget arbitration. If total system prompt exceeds 50% of the context window, retrieval chunks are trimmed first (they're least curated), then older skills are unloaded. L1 and T1 are never trimmed.

D9: Index Build Pipeline

The L2 index is built by a CLI tool that runs during development and produces an artifact shipped with the application:

conductor-knowledge build [--sources <dir>] [--output <path>] [--model <onnx-path>]

The build pipeline:

Parse sources: Read Markdown/text files from the sources directory, organized by domain (midi/, osc/, artnet/, hid/, platform/, devices/)
Chunk: Section-aware splitting with 300-token target, 50-token overlap
Embed: Run each chunk through the ONNX embedding model
Store: Write chunks + embeddings to SQLite database
Validate: Report chunk count, total tokens, domain distribution, and estimated index size

The build output (knowledge.db) is committed to the release artifacts (not to source control). Source documents are committed to docs/knowledge-sources/ for version tracking.

Consequences

Positive

Model-agnostic quality floor. Every LLM provider gets the same L1 reference and L2 retrieval, establishing a minimum knowledge baseline regardless of the model's training data. Users switching between providers see consistent domain competence.
Accurate protocol guidance. The LLM can answer questions about NRPN encoding, ArtNet universe addressing, or HID usage tables with authoritative, citation-backed information rather than plausible hallucinations.
Scalable device support. New device profiles can be added to the index without modifying code, Skills, or system prompts. Community contribution (L3) further accelerates coverage.
Privacy-preserving. L1 and L2 are entirely local — no data leaves the user's machine. L3 is opt-in and sends only embedding vectors.
Token-efficient. L2 retrieval uses ~500-1,500 tokens per query only when relevant content exists, versus 20K+ tokens for static injection of all protocol knowledge.

Negative

Distribution size increase. The ONNX embedding model (~33-80MB) and knowledge index (~2-5MB) increase the application bundle. Mitigated by downloading on first launch rather than bundling, if size is a concern.
Retrieval can miss. Vector similarity search is not perfect — a query about "sustain pedal" might not retrieve the CC64 reference if the embedding spaces don't align well. Mitigated by the relevance threshold (prevents injecting irrelevant content) and by L1 covering the most common cases statically.
Maintenance burden. Protocol specs and device profiles must be kept current. Mitigated by structuring the index build as an automated pipeline and sourcing from authoritative documents.
Cold start latency. Loading the ONNX model at startup adds ~200-500ms. Mitigated by lazy-loading (defer until first chat message) or background initialization.

Risks

Risk	Likelihood	Impact	Mitigation
Retrieved chunk contradicts L1 reference	Low	Medium	L1 is always injected first; LLMs prioritise earlier context. Add instruction: "If Reference Context conflicts with the Core Reference above, prefer Core Reference."
Embedding model produces poor results for domain-specific queries	Medium	Medium	Benchmark retrieval quality with a test suite of 50 query→expected-chunk pairs. Swap embedding model if recall drops below 80%.
Users expect L3 community content to be authoritative	Medium	Low	Clear provenance labelling ("[Community-contributed — verify before use]") and opt-in with explanation.
Index becomes stale	Low	Medium	CI/CD pipeline includes index build step. Version-stamp chunks with spec revision dates.

Implementation Plan

Phase 1 — L1 Core Reference (3-4h)

1A: Author the Core Reference Document (2-3h)

Create docs/llm-reference.md with the 6 sections defined in D2
Populate trigger catalogue from existing event_types.rs and midi_learn.rs
Populate action catalogue from existing action handler code
Add 10-15 annotated mapping examples
Validate token count (target: 2,500, cap: 4,000)

1B: Inject into System Prompt (1h)

Extend build_system_prompt() to read and include the reference document
Add build-time token budget validation
Verify L1 appears in correct position in prompt assembly order

Phase 2 — L2 Local Retrieval (10-14h)

2A: Knowledge Index Infrastructure (3-4h)

Add ort (ONNX Runtime) dependency to Cargo.toml
Implement KnowledgeChunk struct and SQLite schema
Implement embedding function (text → 384-dim vector via ONNX)
Implement brute-force cosine similarity search
Write unit tests for indexing and retrieval

2B: Index Build CLI (2-3h)

Create conductor-knowledge binary or subcommand
Implement section-aware Markdown chunker
Implement the build pipeline (parse → chunk → embed → store)
Validate output: chunk count, domain distribution, index size

2C: Source Content Preparation (3-4h)

Prepare MIDI 1.0 reference chunks (CC table, message types, SysEx format)
Prepare MIDI 2.0 UMP reference chunks
Prepare OSC reference chunks
Prepare initial device profiles (5-10 common controllers)
Prepare platform automation reference (macOS/Windows/Linux patterns)

2D: Retrieval Pipeline Integration (2-3h)

Implement the 7-step retrieval pipeline (D3.5)
Integrate with build_system_prompt()
Add domain filter heuristics
Add relevance threshold and budget enforcement
Add Tauri commands for index status

Phase 3 — Polish & Extend (4-6h)

3A: Provider-Aware Tuning (1-2h)

Implement ProviderProfile with default profiles for Claude/GPT-4/Gemini/local
Wire provider detection to retrieval parameter adjustment

3B: Knowledge Sources Settings UI (2-3h)

Add Knowledge Sources section to App Settings
Implement source toggle, index status, update button
Add device profile import

3C: L3 Online Retrieval Stub (1h)

Implement the L3 retrieval interface and opt-in toggle
Stub the Hub API client (actual Hub is a future project)
Add provenance labelling for community content

Dependency Graph

Phase 1A (reference doc) ──→ Phase 1B (injection)
                                    ↓
Phase 2A (index infra) ──→ Phase 2B (build CLI) ──→ Phase 2C (content) ──→ Phase 2D (pipeline)
                                                                                    ↓
                                                                  Phase 3A (provider tuning)
                                                                  Phase 3B (settings UI)
                                                                  Phase 3C (L3 stub)

Phase 1 and Phase 2A can start in parallel.
Phase 3 sub-tasks are independent of each other.

Effort Estimate

Phase	Hours	Dependency
Phase 1: L1 Core Reference	3-4h	None (can start immediately)
Phase 2: L2 Local Retrieval	10-14h	Phase 1B for injection point
Phase 3: Polish & Extend	4-6h	Phase 2D complete
Total	17-24h

References

ADR-007: LLM Integration Architecture — provider abstraction, MCP tools, Skills system, tool risk tiers
ADR-015: LLM Signal Awareness — T1/T2/T3 signal context injection, topology summary, signal pulse
ADR-013: LLM Canvas — artifact projection, markdown artifact type for rendering retrieved guides
Context Optimizer: conductor-gui/src-tauri/src/llm/context.rs — token estimation, pruning strategies
Chat Store: conductor-gui/ui/src/lib/stores/chat.js — message management, system context assembly
MIDI 1.0 Detailed Specification: The MIDI Manufacturers Association
MIDI 2.0 UMP Specification: The MIDI Association (AMEI/MMA)
OSC 1.0 Specification: CNMAT, UC Berkeley
ArtNet 4 Protocol Specification: Artistic Licence
USB HID Usage Tables: USB Implementers Forum
all-MiniLM-L6-v2: Sentence Transformers, Hugging Face (MIT License)
ort (ONNX Runtime for Rust): https://github.com/pykeio/ort (MIT/Apache-2.0)

Status​

Context​

Problem Statement​

Why Not Just Larger Context Windows?​

Why Not Just More Skills?​

Design Exploration​

Decision​

D1: Three-Layer Knowledge Architecture​

D2: L1 — Core Reference Document​

D3: L2 — Local Retrieval Index​

D3.1: Index Contents​

D3.2: Chunking Strategy​

D3.3: Embedding Model​

D3.4: Index Storage​

D3.5: Retrieval Pipeline​

D4: L3 — Optional Online Retrieval​

D4.1: Online Sources​

D4.2: Retrieval Mechanism​

D4.3: Privacy​

D5: Retrieval Integration Points​

D5.1: System Prompt Assembly (Rust-side)​

D5.2: Tauri Command for Index Management​

D6: Provider-Aware Retrieval Tuning​

D7: Knowledge Source Management UI​

D8: Interaction with Skills System​

D9: Index Build Pipeline​

Consequences​

Positive​

Negative​

Risks​

Implementation Plan​

Phase 1 — L1 Core Reference (3-4h)​

Phase 2 — L2 Local Retrieval (10-14h)​

Phase 3 — Polish & Extend (4-6h)​

Dependency Graph​

Effort Estimate​

References​