ADR-018: Knowledge Layer & Retrieval Architecture
Status
Proposed (Draft)
Relationship to other ADRs:
- ADR-007 (LLM Integration Architecture): Defines the provider abstraction, MCP tools, Skills/SKILL.md, and the system prompt structure this ADR extends.
- ADR-015 (LLM Signal Awareness): Defines the T1/T2/T3 signal context injection that provides runtime awareness. This ADR addresses a complementary concern — domain knowledge rather than runtime state.
- ADR-013 (LLM Canvas): Artifact types like
markdownmay render knowledge-layer content (e.g., protocol guides, mapping tutorials).
Context
Problem Statement
Conductor supports multiple LLM providers (OpenAI, Anthropic, Google, OpenRouter, LiteLLM) with varying degrees of baseline knowledge. A user running Claude Sonnet, GPT-4, Gemini Pro, or a local model via LiteLLM will get meaningfully different results for the same mapping request — not because the models differ in reasoning ability, but because they differ in what they know.
The knowledge gaps fall into four categories, each with different characteristics:
Category 1 — Conductor Internals: No public LLM has been trained on Conductor's configuration schema, trigger vocabulary, action types, mode semantics, velocity curve formats, conditional logic syntax, or device binding structure. This knowledge is entirely proprietary. Every model is equally blind here. The SKILL.md system (ADR-007) partially addresses this by injecting domain expertise into the prompt, but Skills are prose-oriented ("how to think about velocity curves") rather than reference-oriented ("the exact TOML schema for a velocity curve mapping").
Category 2 — Protocol Specifications: MIDI 1.0 basics (Note On/Off, CC, Program Change) are well-represented in most models' training data. But coverage degrades rapidly for: MIDI 2.0 UMP (Universal MIDI Packet) format, MIDI-CI (Capability Inquiry), NRPN/RPN encoding, SysEx manufacturer IDs, OSC type tags and bundle semantics, ArtNet universe addressing and DMX channel layouts, HID usage tables (especially for non-standard controllers), and Mackie Control / HUI protocol specifics. A user asking "set up my controller's NRPN for filter cutoff" will get confident-but-wrong answers from most models.
Category 3 — Platform Automation: Conductor's action system can trigger OS-level automation — keystrokes, mouse events, application commands, shell execution. The LLM needs platform-specific knowledge to suggest correct actions: AppleScript/JXA on macOS, PowerShell/UI Automation on Windows, D-Bus/xdotool/xdg on Linux. Models have decent baseline coverage for common patterns but are unreliable for edge cases (e.g., scripting specific DAW automation, addressing particular application windows, accessibility API patterns).
Category 4 — Device Profiles: The long tail. Each MIDI controller, gamepad, or HID device has its own CC assignments, note layouts, button grids, encoder behaviour, LED/display feedback protocols, and SysEx commands. Conductor's device descriptor files capture detection signatures but not operational knowledge ("the Mikro MK3's pads send Note On ch.10 with velocity, the knobs send CC 70-77 ch.1"). This knowledge currently lives in the user's head or in manufacturer documentation the LLM hasn't seen.
Why Not Just Larger Context Windows?
As context windows grow (128K → 1M+), it's tempting to prepend large reference documents to the system prompt. This fails for three reasons:
- Token cost scales linearly with users. Every API call pays for every injected token. A 20K-token protocol reference multiplied by thousands of daily conversations is economically unsustainable — particularly when 90% of conversations don't need protocol details.
- Attention degrades with distance. LLMs retrieve information less reliably from the middle of long contexts ("lost in the middle" effect). A 50K-token system prompt means the model is less likely to use any individual fact correctly than if the same fact appeared in a 2K-token retrieval result placed near the user's query.
- Freshness. Protocol specs update (MIDI 2.0 has had three revisions since 2020). Device profiles are released continuously. Static system prompts fossilise knowledge at deploy time.
Why Not Just More Skills?
The SKILL.md system (ADR-007) is the right mechanism for expertise — teaching the LLM how to reason about Conductor's domain. But Skills are:
- Coarse-grained. A skill is loaded in its entirety when triggered. You can't load "just the NRPN section" of a MIDI skill.
- Author-curated. Someone must write and maintain each skill. This doesn't scale to 500 device profiles or the full MIDI 2.0 spec.
- Prompt-budget constrained. Each loaded skill consumes context. Loading 5 skills simultaneously competes with conversation history, T1/T2/T3 signal context, and tool schemas for the same context window.
Skills and retrieval are complementary: Skills provide the reasoning framework ("when a user asks about velocity sensitivity, think about curve shapes, dead zones, and the difference between note velocity and aftertouch pressure"), while retrieval provides the reference facts ("CC74 is the standard assignment for brightness/timbre on General MIDI").
Design Exploration
Retrieval-Augmented Generation (RAG) in this context means: given a user query and conversation state, select relevant knowledge chunks and inject them into the LLM context alongside the query. The retrieval is transparent to the LLM — it sees additional context, not a special tool.
The alternatives considered:
| Approach | Pros | Cons |
|---|---|---|
| Static system prompt | Simple, deterministic, no retrieval latency | Token-expensive, doesn't scale, stale |
| Skills only | Curated quality, integrated with ADR-007 | Coarse-grained, manual authoring, prompt budget |
Tool-based retrieval (LLM calls a search_docs tool) | LLM controls when to search | Adds a tool call round-trip, LLM may not know to search, inconsistent across providers |
| Automatic RAG (retrieval before LLM call) | Transparent, no extra round-trip, works with all providers | Requires embedding infrastructure, relevance judgement is imperfect |
| Hybrid: static + automatic RAG | Best of both — guaranteed baseline + dynamic enrichment | More complex, two systems to maintain |
Decision
D1: Three-Layer Knowledge Architecture
Knowledge is organized into three layers with different delivery mechanisms, reflecting the tradeoff between reliability and scalability:
| Layer | Name | Content | Delivery | Size | Update Cadence |
|---|---|---|---|---|---|
| L1 | Core Reference | Conductor schema, trigger/action vocabulary, config grammar, mode semantics, tool usage patterns | Static system prompt injection | ~2-4K tokens | Per release |
| L2 | Domain Index | Protocol specs (MIDI, OSC, ArtNet, HID), platform automation APIs, curated device profiles | Automatic RAG — retrieved per query | 10-50K tokens indexed, 500-1500 tokens retrieved per query | Periodic updates (quarterly or on spec revision) |
| L3 | Community & Live | User-contributed device profiles, community mappings, online documentation | Optional online retrieval (user opt-in) | Unbounded | Continuous |
Rationale: L1 is always present because the LLM cannot function without Conductor-specific vocabulary — this is the cost of enabling the tool. L2 is retrieved on-demand because protocol knowledge is large but only selectively relevant — a question about velocity curves doesn't need ArtNet knowledge. L3 is optional because it requires network access and introduces trust/freshness concerns that should be under user control.
D2: L1 — Core Reference Document
A structured reference document ships with Conductor and is injected into the system prompt on every LLM call. This replaces the current approach of relying solely on Skills and tool schemas to communicate Conductor's domain.
The reference document covers:
- Configuration Schema Reference — TOML structure for devices, modes, mappings, profiles, with annotated examples
- Trigger Type Catalogue — every trigger variant (NoteOn, CC, DoubleTap, Chord, LongPress, AftertouchZone, PitchBendZone, CompoundTrigger, Gamepad*) with parameters and valid ranges
- Action Type Catalogue — every action variant (Keystroke, MouseClick, SendMidi, OscSend, ShellCommand, LaunchApp, Delay, Sequence, Conditional) with parameters
- Velocity Curve Types — Fixed, PassThrough, Linear, Curve (Exponential, Logarithmic, S-Curve) with parameter ranges
- Mode Semantics — mode switching, fallback chains, activation conditions
- Common Patterns — 10-15 annotated mapping examples covering typical use cases
Format: Markdown, compiled into a single string at build time. Structured with headers so the content is scannable by both humans (for maintenance) and LLMs (for attention).
Token budget: Target 2,500 tokens. Hard cap at 4,000 tokens. This budget is firm — it shares the system prompt with T1 topology (~200 tokens), tool schemas (~2,000 tokens for 20 MCP tools), and any loaded Skills.
Injection point: Appended to the system prompt in _buildMessages() (chat.js frontend) or build_system_prompt() (Rust backend, depending on where system prompt assembly lives at implementation time). Placed after the base system prompt and before T1 topology, so the LLM sees: identity → domain reference → current topology → conversation.
Maintenance: The reference document lives in docs/llm-reference.md in the repository. A build-time script validates it fits within the token budget (using the same character-based estimation from context.rs: chars * 0.25 + overhead). CI fails if the reference exceeds 4,000 estimated tokens.
System Prompt Assembly Order:
┌─────────────────────────────────┐
│ Base Identity & Behaviour │ ~200 tokens
├─────────────────────────────────┤
│ L1: Core Reference Document │ ~2,500 tokens (cap: 4,000)
├─────────────────────────────────┤
│ Loaded Skills (0-3) │ ~1,000-5,000 tokens
├─────────────────────────────────┤
│ T1: Structural Topology │ ~150-250 tokens (ADR-015)
├─────────────────────────────────┤
│ MCP Tool Schemas │ ~2,000 tokens
├─────────────────────────────────┤
│ T2: Signal Pulse (injected) │ ~100-300 tokens (ADR-015)
├─────────────────────────────────┤
│ Artifact Context (if edits) │ ~100-500 tokens (ADR-013)
├─────────────────────────────────┤
│ Retrieved L2 Chunks (if any) │ ~500-1,500 tokens
└─────────────────────────────────┘
Total system budget: ~7,000-14,000 tokens (well within 128K window)
D3: L2 — Local Retrieval Index
A vector similarity index ships with Conductor (or downloads on first launch) containing chunked, embedded domain knowledge. On each user message, the retrieval layer queries this index and injects relevant chunks into the LLM context.
D3.1: Index Contents
The L2 index covers four domains:
| Domain | Source Material | Chunk Count (est.) | Update Frequency |
|---|---|---|---|
| MIDI | MIDI 1.0 Detailed Spec, MIDI 2.0 UMP Spec, MIDI-CI, GM/GM2 controller assignments, SysEx manufacturer IDs | 200-300 chunks | On spec revision |
| OSC | OSC 1.0/1.1 spec, common address patterns, type tag reference | 40-60 chunks | Rare |
| ArtNet | ArtNet 4 spec, universe/subnet addressing, DMX channel layouts | 60-80 chunks | On spec revision |
| HID | USB HID usage tables (generic desktop, game controllers, keyboard), descriptor format | 80-120 chunks | On spec revision |
| Platform Automation | macOS (AppleScript/JXA patterns, Accessibility API), Windows (PowerShell, SendKeys, UI Automation), Linux (xdotool, D-Bus, xdg) | 100-150 chunks | Per OS release |
| Device Profiles | Curated profiles for 20-30 common controllers (Akai, Novation, Korg, Native Instruments, Arturia) | 60-90 chunks | As devices are profiled |
Total estimated chunks: 540-800, each 200-500 tokens. Total index size: ~2-5 MB on disk (embeddings + metadata).
D3.2: Chunking Strategy
Documents are chunked using a section-aware splitter that respects document structure:
- Split on section boundaries (headers, numbered clauses in specs)
- Target chunk size: 300 tokens (200-500 range)
- Each chunk retains: source document, section path (e.g., "MIDI 1.0 > Channel Voice Messages > Control Change"), domain tag, chunk ID
- Overlapping context: 50-token overlap between adjacent chunks to preserve cross-boundary information
Chunks are stored as:
struct KnowledgeChunk {
id: String,
domain: KnowledgeDomain, // Midi, Osc, ArtNet, Hid, Platform, Device
source: String, // "midi-1.0-spec", "akai-mpk-mini-profile"
section_path: Vec<String>, // ["Channel Voice Messages", "Control Change"]
content: String, // The actual text
embedding: Vec<f32>, // Dense vector (384-dim or 768-dim)
metadata: ChunkMetadata, // version, last_updated, relevance_tags
}
enum KnowledgeDomain {
Midi,
Osc,
ArtNet,
Hid,
Platform(PlatformTarget),
Device(String), // Device identifier
}
enum PlatformTarget {
MacOS,
Windows,
Linux,
}
D3.3: Embedding Model
The embedding model runs locally to avoid API dependencies and preserve privacy. Requirements:
- Inference: must run on CPU in <100ms per query (single embedding)
- Dimensions: 384 recommended (balances quality vs. index size)
- Model:
all-MiniLM-L6-v2(via ONNX Runtime) as the default — 80MB model, 384-dim, well-benchmarked on technical content - Alternative:
bge-small-en-v1.5(33MB, 384-dim) for smaller install footprint
The embedding model is bundled as an ONNX file in the Conductor distribution. The Rust backend loads it via ort (ONNX Runtime for Rust) at startup.
D3.4: Index Storage
The index is stored in SQLite using a single table with a vector similarity extension:
CREATE TABLE knowledge_chunks (
id TEXT PRIMARY KEY,
domain TEXT NOT NULL,
source TEXT NOT NULL,
section_path TEXT NOT NULL, -- JSON array
content TEXT NOT NULL,
embedding BLOB NOT NULL, -- f32 array, little-endian
metadata TEXT NOT NULL, -- JSON
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
CREATE INDEX idx_chunks_domain ON knowledge_chunks(domain);
CREATE INDEX idx_chunks_source ON knowledge_chunks(source);
Vector similarity search uses brute-force cosine similarity over the BLOB embeddings. At 500-800 chunks, brute-force search completes in <5ms — no approximate nearest neighbor index is needed at this scale. If the index grows beyond 5,000 chunks (L3 community content), switch to an ANN structure (HNSW via sqlite-vss or a dedicated vector library).
Database location: $CONDUCTOR_DATA_DIR/knowledge/knowledge.db alongside knowledge.onnx (embedding model).
D3.5: Retrieval Pipeline
On each user message, before the LLM API call:
User Message
│
▼
┌─────────────────────────┐
│ 1. Query Formation │ Combine: user message + last assistant message
│ │ (provides conversational context for better retrieval)
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ 2. Domain Filter │ Heuristic pre-filter based on keywords:
│ │ "CC", "note", "velocity" → Midi
│ │ "OSC", "address" → Osc
│ │ "DMX", "universe" → ArtNet
│ │ "gamepad", "HID" → Hid
│ │ "AppleScript", "keystroke" → Platform
│ │ Device names → Device
│ │ No match → search all domains
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ 3. Embed Query │ ONNX model → 384-dim vector
│ │ Latency: <50ms on CPU
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ 4. Similarity Search │ Cosine similarity against filtered chunks
│ │ Return top-K (K=5)
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ 5. Relevance Threshold │ Discard chunks with similarity < 0.35
│ │ (prevents injection of irrelevant content)
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ 6. Budget Enforcement │ Total retrieved tokens ≤ 1,500
│ │ Trim lowest-ranked chunks if over budget
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ 7. Context Injection │ Format as "## Reference Context" section
│ │ Inject after L1 reference, before conversation
└─────────────────────────┘
Retrieval is transparent to the LLM. The retrieved chunks appear as part of the system context, not as a tool call result. This ensures consistent behaviour across all providers and avoids the problem of the LLM needing to decide when to search.
Injection format:
## Reference Context
The following reference information may be relevant to the current question.
### MIDI 1.0 — Control Change Messages
Control Change messages (status byte 0xBn) carry a controller number (0-127)
and value (0-127). Common assignments include:
- CC1: Modulation Wheel
- CC7: Channel Volume
- CC10: Pan
- CC64: Sustain Pedal (Damper)
- CC74: Brightness (Sound Controller 5)
[Source: MIDI 1.0 Detailed Specification, §4.2.1]
### Akai MPK Mini MK3 — Control Layout
Pads: 8 pads sending Note On/Off on ch.10 (default), velocity sensitive.
Knobs: 8 knobs sending CC 70-77 on ch.1 (default bank A).
[Source: akai-mpk-mini-mk3 device profile]
D4: L3 — Optional Online Retrieval
L3 extends the knowledge layer with content that cannot ship locally: community-contributed device profiles, manufacturer documentation, and emerging protocol extensions. L3 is disabled by default and requires explicit user opt-in.
D4.1: Online Sources
| Source | Content | Trust Level |
|---|---|---|
| Conductor Hub (future) | Community device profiles, shared mapping templates | Curated — reviewed before indexing |
| Manufacturer docs | PDF manuals, MIDI implementation charts | High — authoritative but may require parsing |
| Community forums | MIDI mapping tips, DAW-specific automation | Low — useful but unverified |
D4.2: Retrieval Mechanism
When L3 is enabled, the retrieval pipeline adds a step after L2 local search:
- If L2 returns fewer than 2 chunks above threshold, query the Conductor Hub API
- Hub API accepts the same embedding vector and returns up to 3 chunks
- Hub chunks are marked with
source: "community"and injected with a provenance note: "[Community-contributed — verify before use]" - Hub results are cached locally for 24 hours to reduce API calls
D4.3: Privacy
- L3 sends the embedding vector of the query to the Hub, not the raw query text. Embedding vectors cannot be trivially reversed to recover the original text.
- The user can inspect what was sent in the audit log (ADR-007 Phase 4).
- L3 is off by default. Enabling it requires navigating to Settings → LLM → Knowledge Sources and toggling "Community Knowledge."
D5: Retrieval Integration Points
The retrieval layer integrates with the existing Rust backend at two points:
D5.1: System Prompt Assembly (Rust-side)
The build_system_prompt() function (or its equivalent) is extended to call the retrieval pipeline:
pub async fn build_system_prompt(
&self,
user_message: &str,
conversation: &[ChatMessage],
config: &AppConfig,
) -> String {
let mut prompt = String::new();
// Base identity
prompt.push_str(&self.base_identity);
// L1: Core Reference (always present)
prompt.push_str(&self.core_reference);
// Loaded Skills
for skill in &self.active_skills {
prompt.push_str(&skill.content);
}
// T1: Structural Topology (ADR-015)
prompt.push_str(&self.build_topology_summary());
// L2 + L3: Retrieved knowledge (query-dependent)
let retrieved = self.knowledge_index
.retrieve(user_message, conversation.last(), config.knowledge_settings())
.await;
if !retrieved.chunks.is_empty() {
prompt.push_str("\n\n## Reference Context\n\n");
prompt.push_str(&retrieved.format_for_injection());
}
prompt
}
D5.2: Tauri Command for Index Management
Expose index management via Tauri commands for the Settings UI:
#[tauri::command]
async fn knowledge_get_status(state: State<'_, KnowledgeIndex>) -> Result<KnowledgeStatus, String>;
#[tauri::command]
async fn knowledge_get_sources(state: State<'_, KnowledgeIndex>) -> Result<Vec<KnowledgeSource>, String>;
#[tauri::command]
async fn knowledge_toggle_source(
state: State<'_, KnowledgeIndex>,
source_id: String,
enabled: bool,
) -> Result<(), String>;
#[tauri::command]
async fn knowledge_update_index(state: State<'_, KnowledgeIndex>) -> Result<UpdateResult, String>;
#[tauri::command]
async fn knowledge_add_device_profile(
state: State<'_, KnowledgeIndex>,
profile: DeviceProfile,
) -> Result<(), String>;
D6: Provider-Aware Retrieval Tuning
Different LLM providers have different baseline knowledge. The retrieval layer can optionally adjust its behaviour based on the configured provider:
| Provider | Baseline Protocol Knowledge | Retrieval Adjustment |
|---|---|---|
| Claude (Anthropic) | Strong on MIDI basics, moderate OSC, weak ArtNet/HID | Boost ArtNet/HID retrieval weight |
| GPT-4 (OpenAI) | Strong on MIDI basics, strong platform automation, weak protocol edge cases | Boost NRPN/SysEx/MIDI 2.0 retrieval |
| Gemini (Google) | Moderate across all domains | No adjustment (baseline) |
| Local models (LiteLLM) | Highly variable, often weak across all domains | Lower relevance threshold (0.25 vs 0.35), increase K to 7 |
This is implemented as a ProviderProfile that adjusts retrieval parameters:
struct ProviderProfile {
relevance_threshold: f32,
max_chunks: usize,
max_tokens: usize,
domain_boosts: HashMap<KnowledgeDomain, f32>, // Multiplier on similarity score
}
Provider profiles are best-effort heuristics, not hard rules. They default to the baseline profile and can be overridden in settings.
D7: Knowledge Source Management UI
A new section in App Settings → LLM: Knowledge Sources.
| Element | Description |
|---|---|
| Index Status | "540 chunks indexed across 6 domains" / "Index not built" |
| Source List | Toggle-able list: MIDI Spec ✓, OSC Spec ✓, ArtNet Spec ✓, HID Tables ✓, Platform (macOS) ✓, Device Profiles ✓ |
| Community Knowledge | Toggle (default off) with explanation: "When enabled, queries the Conductor Hub for community-contributed device profiles and mapping patterns. Sends anonymised search vectors only." |
| Update Index | Button to re-index from bundled sources. Shows last update date. |
| Add Device Profile | Import a device profile (JSON/TOML) to add to the local index |
| Index Size | "2.3 MB on disk" |
D8: Interaction with Skills System
L1/L2/L3 retrieval and Skills (ADR-007) are complementary and can coexist in the same prompt. Their interaction is:
- Skills fire first. Skill loading is determined by the user's intent (detected from the message or manually triggered). Skills provide reasoning frameworks and workflow guidance.
- Retrieval supplements. L2 retrieval runs on every message regardless of skill state. If a MIDI skill is loaded AND L2 retrieves MIDI protocol chunks, both appear in context — the skill provides "how to think" and the retrieval provides "specific facts."
- Budget arbitration. If total system prompt exceeds 50% of the context window, retrieval chunks are trimmed first (they're least curated), then older skills are unloaded. L1 and T1 are never trimmed.
D9: Index Build Pipeline
The L2 index is built by a CLI tool that runs during development and produces an artifact shipped with the application:
conductor-knowledge build [--sources <dir>] [--output <path>] [--model <onnx-path>]
The build pipeline:
- Parse sources: Read Markdown/text files from the sources directory, organized by domain (
midi/,osc/,artnet/,hid/,platform/,devices/) - Chunk: Section-aware splitting with 300-token target, 50-token overlap
- Embed: Run each chunk through the ONNX embedding model
- Store: Write chunks + embeddings to SQLite database
- Validate: Report chunk count, total tokens, domain distribution, and estimated index size
The build output (knowledge.db) is committed to the release artifacts (not to source control). Source documents are committed to docs/knowledge-sources/ for version tracking.
Consequences
Positive
- Model-agnostic quality floor. Every LLM provider gets the same L1 reference and L2 retrieval, establishing a minimum knowledge baseline regardless of the model's training data. Users switching between providers see consistent domain competence.
- Accurate protocol guidance. The LLM can answer questions about NRPN encoding, ArtNet universe addressing, or HID usage tables with authoritative, citation-backed information rather than plausible hallucinations.
- Scalable device support. New device profiles can be added to the index without modifying code, Skills, or system prompts. Community contribution (L3) further accelerates coverage.
- Privacy-preserving. L1 and L2 are entirely local — no data leaves the user's machine. L3 is opt-in and sends only embedding vectors.
- Token-efficient. L2 retrieval uses ~500-1,500 tokens per query only when relevant content exists, versus 20K+ tokens for static injection of all protocol knowledge.
Negative
- Distribution size increase. The ONNX embedding model (~33-80MB) and knowledge index (~2-5MB) increase the application bundle. Mitigated by downloading on first launch rather than bundling, if size is a concern.
- Retrieval can miss. Vector similarity search is not perfect — a query about "sustain pedal" might not retrieve the CC64 reference if the embedding spaces don't align well. Mitigated by the relevance threshold (prevents injecting irrelevant content) and by L1 covering the most common cases statically.
- Maintenance burden. Protocol specs and device profiles must be kept current. Mitigated by structuring the index build as an automated pipeline and sourcing from authoritative documents.
- Cold start latency. Loading the ONNX model at startup adds ~200-500ms. Mitigated by lazy-loading (defer until first chat message) or background initialization.
Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Retrieved chunk contradicts L1 reference | Low | Medium | L1 is always injected first; LLMs prioritise earlier context. Add instruction: "If Reference Context conflicts with the Core Reference above, prefer Core Reference." |
| Embedding model produces poor results for domain-specific queries | Medium | Medium | Benchmark retrieval quality with a test suite of 50 query→expected-chunk pairs. Swap embedding model if recall drops below 80%. |
| Users expect L3 community content to be authoritative | Medium | Low | Clear provenance labelling ("[Community-contributed — verify before use]") and opt-in with explanation. |
| Index becomes stale | Low | Medium | CI/CD pipeline includes index build step. Version-stamp chunks with spec revision dates. |
Implementation Plan
Phase 1 — L1 Core Reference (3-4h)
1A: Author the Core Reference Document (2-3h)
- Create
docs/llm-reference.mdwith the 6 sections defined in D2 - Populate trigger catalogue from existing
event_types.rsandmidi_learn.rs - Populate action catalogue from existing action handler code
- Add 10-15 annotated mapping examples
- Validate token count (target: 2,500, cap: 4,000)
1B: Inject into System Prompt (1h)
- Extend
build_system_prompt()to read and include the reference document - Add build-time token budget validation
- Verify L1 appears in correct position in prompt assembly order
Phase 2 — L2 Local Retrieval (10-14h)
2A: Knowledge Index Infrastructure (3-4h)
- Add
ort(ONNX Runtime) dependency to Cargo.toml - Implement
KnowledgeChunkstruct and SQLite schema - Implement embedding function (text → 384-dim vector via ONNX)
- Implement brute-force cosine similarity search
- Write unit tests for indexing and retrieval
2B: Index Build CLI (2-3h)
- Create
conductor-knowledgebinary or subcommand - Implement section-aware Markdown chunker
- Implement the build pipeline (parse → chunk → embed → store)
- Validate output: chunk count, domain distribution, index size
2C: Source Content Preparation (3-4h)
- Prepare MIDI 1.0 reference chunks (CC table, message types, SysEx format)
- Prepare MIDI 2.0 UMP reference chunks
- Prepare OSC reference chunks
- Prepare initial device profiles (5-10 common controllers)
- Prepare platform automation reference (macOS/Windows/Linux patterns)
2D: Retrieval Pipeline Integration (2-3h)
- Implement the 7-step retrieval pipeline (D3.5)
- Integrate with
build_system_prompt() - Add domain filter heuristics
- Add relevance threshold and budget enforcement
- Add Tauri commands for index status
Phase 3 — Polish & Extend (4-6h)
3A: Provider-Aware Tuning (1-2h)
- Implement
ProviderProfilewith default profiles for Claude/GPT-4/Gemini/local - Wire provider detection to retrieval parameter adjustment
3B: Knowledge Sources Settings UI (2-3h)
- Add Knowledge Sources section to App Settings
- Implement source toggle, index status, update button
- Add device profile import
3C: L3 Online Retrieval Stub (1h)
- Implement the L3 retrieval interface and opt-in toggle
- Stub the Hub API client (actual Hub is a future project)
- Add provenance labelling for community content
Dependency Graph
Phase 1A (reference doc) ──→ Phase 1B (injection)
↓
Phase 2A (index infra) ──→ Phase 2B (build CLI) ──→ Phase 2C (content) ──→ Phase 2D (pipeline)
↓
Phase 3A (provider tuning)
Phase 3B (settings UI)
Phase 3C (L3 stub)
Phase 1 and Phase 2A can start in parallel.
Phase 3 sub-tasks are independent of each other.
Effort Estimate
| Phase | Hours | Dependency |
|---|---|---|
| Phase 1: L1 Core Reference | 3-4h | None (can start immediately) |
| Phase 2: L2 Local Retrieval | 10-14h | Phase 1B for injection point |
| Phase 3: Polish & Extend | 4-6h | Phase 2D complete |
| Total | 17-24h |
References
- ADR-007: LLM Integration Architecture — provider abstraction, MCP tools, Skills system, tool risk tiers
- ADR-015: LLM Signal Awareness — T1/T2/T3 signal context injection, topology summary, signal pulse
- ADR-013: LLM Canvas — artifact projection,
markdownartifact type for rendering retrieved guides - Context Optimizer:
conductor-gui/src-tauri/src/llm/context.rs— token estimation, pruning strategies - Chat Store:
conductor-gui/ui/src/lib/stores/chat.js— message management, system context assembly - MIDI 1.0 Detailed Specification: The MIDI Manufacturers Association
- MIDI 2.0 UMP Specification: The MIDI Association (AMEI/MMA)
- OSC 1.0 Specification: CNMAT, UC Berkeley
- ArtNet 4 Protocol Specification: Artistic Licence
- USB HID Usage Tables: USB Implementers Forum
- all-MiniLM-L6-v2: Sentence Transformers, Hugging Face (MIT License)
- ort (ONNX Runtime for Rust): https://github.com/pykeio/ort (MIT/Apache-2.0)