ADR-023: Knowledge Layer Deployment Boundary

Status

Draft (Revised after LLM Council review — GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro)

Relationship to other ADRs:

ADR-018 (Knowledge Layer & Retrieval Architecture): Defines the three-layer knowledge system (L1/L2/L3), embedding model, retrieval pipeline, and provider-aware tuning. This ADR constrains where that infrastructure runs.

ADR-007 (LLM Integration Architecture): Defines the provider abstraction, MCP tools, and chat UI — all currently in conductor-gui. This ADR preserves that boundary.

ADR-022 (Device Discovery, Bindings & Multi-Protocol Routing): Phase 6 depends on L2 retrieval for device profile suggestions. The deployment boundary determines which process serves those suggestions.

Scope: This ADR covers the initial desktop deployment of the knowledge layer. Mobile (Tauri iOS/Android), web, and server-side deployments are explicitly out of scope but the architecture preserves escape hatches for each (see D3, Future Considerations).

Implementation note: The knowledge code was initially placed in conductor-daemon/src/daemon/knowledge/ during ADR-018 Phase 2 implementation (#648–#651). This ADR establishes the correct boundary and mandates migration to a dedicated conductor-knowledge crate.

Context

Problem Statement

ADR-018 specifies a local embedding model (all-MiniLM-L6-v2 via ONNX Runtime, 80MB) loaded via the ort Rust crate, a SQLite vector index (2-5MB), and a retrieval pipeline that runs before every LLM API call. The ADR describes the knowledge architecture in detail but is silent on a critical deployment question: which binary hosts this infrastructure?

The workspace has four crates:

Crate	Purpose	Binary Size (release)	Resident Memory	Dependency Profile
conductor-core	Pure engine library	N/A (library)	N/A	Zero I/O deps, no runtime
conductor-daemon	Event pipeline, MCP server, IPC	~5MB	10-15MB	midir, gilrs, tokio, rusqlite, enigo
conductor-gui	Tauri visual interface + LLM chat	~20MB (Tauri baseline)	80-120MB	tauri, reqwest, rusqlite, webview
conductor (root)	Compatibility re-export	N/A	N/A	Just re-exports core

The ort crate (ONNX Runtime bindings) pulls in a ~50MB native C++ shared library (libonnxruntime). Combined with the model weights (80MB for MiniLM, 33MB for bge-small), this adds ~130MB disk and ~150-200MB resident memory (model weights + ONNX Runtime session buffers + thread pool) to whichever binary hosts it.

Why This Matters

1. The daemon is a headless open-source tool. Users run conductor-daemon without the GUI for headless setups (Raspberry Pi, studio rack servers, CI/CD-driven config), via SSH, or as a system service. Today it's a 5MB binary with 10-15MB resident memory. Adding 130MB of ONNX infrastructure to the daemon means every user pays for LLM knowledge features whether they use them or not. This violates the project's lightweight-daemon design principle.

2. The knowledge layer is an LLM feature. L2/L3 retrieval only runs when an LLM provider is configured and the user is in a chat session. The daemon's MCP server exposes tools that the LLM calls, but the MCP server itself doesn't need embeddings — it serves structured data (device status, port lists, config diffs). The retrieval pipeline sits between the user's message and the LLM API call, not between the MCP tool and the daemon.

3. Cross-compilation. ort bundles platform-specific C++ binaries. The daemon currently cross-compiles cleanly for x86_64 and aarch64 (macOS universal, Linux ARM). Adding ort complicates this: ONNX Runtime pre-built binaries are available for major platforms but not all targets, and building from source requires a C++ toolchain with CMake. The GUI already accepts a heavier build chain (Tauri requires Node.js + system webview SDK), so the marginal cost is lower there.

4. Target user context. Conductor's primary users are musicians and audio engineers running DAWs (Ableton, Logic, Bitwig). DAW workstations are typically memory-constrained — the DAW + plugins consume 4-12GB on an 8-16GB machine. The knowledge layer's ~200MB RSS footprint is meaningful in this context and should be isolated from the always-running daemon.

Current Architecture (LLM Chat Data Flow)

User types in Chat UI (conductor-gui)
       │
       ▼
┌──────────────────────────────┐
│  GUI: Chat Provider          │
│  - Formats system prompt     │
│  - Injects L1 core ref       │
│  - Injects T1/T2/T3 signals  │
│  - Calls LLM API (SSE)       │
│  - Processes tool calls       │
└──────────┬───────────────────┘
           │ (tool calls)
           ▼
┌──────────────────────────────┐
│  GUI: ToolExecutor           │
│  - Dispatches to daemon IPC  │
│  - Or executes locally        │
└──────────┬───────────────────┘
           │ (IPC: JSON over Unix socket)
           ▼
┌──────────────────────────────┐
│  Daemon: MCP Server          │
│  - Executes ReadOnly tools   │
│  - Queues ConfigChange plans │
│  - Returns structured JSON   │
└──────────────────────────────┘

Note: the LLM API call happens in the GUI process. The daemon never talks to LLM providers. L1 injection (system prompt) already happens in the GUI. The question is where L2/L3 retrieval goes.

Decision

D1: Knowledge Retrieval Runs in the Application Layer, Not the Daemon

For desktop builds, retrieval is hosted in the application layer, initially in-process in conductor-gui, behind a stable KnowledgeService trait boundary. conductor-daemon remains knowledge-unaware.

Rationale:

The retrieval pipeline fires before the LLM API call. The GUI already owns this call path (system prompt assembly → L1 injection → API request → SSE streaming → tool dispatch). L2 retrieval inserts at one point in this existing pipeline.
The daemon stays at 5MB and cross-compiles without C++ toolchain requirements.
Users who run daemon-only get zero knowledge layer overhead.
The KnowledgeService trait boundary (D3) preserves the option to move retrieval to a sidecar process or remote service without changing consumers.

User types in Chat UI (conductor-gui)
       │
       ▼
┌──────────────────────────────────────────┐
│  GUI: Chat Provider                      │
│  - Formats system prompt                 │
│  - Injects L1 core ref                   │
│  - Injects T1/T2/T3 signals             │
│  ┌────────────────────────────────────┐  │
│  │ KnowledgeService::retrieve_chunks()│  │  ← trait call
│  │ (initially: InProcessKnowledge)   │  │
│  │ - Query formation                 │  │
│  │ - Domain filter                   │  │
│  │ - ONNX embed (ort)               │  │
│  │ - Cosine similarity              │  │
│  │ - Inject into context            │  │
│  └────────────────────────────────────┘  │
│  - Calls LLM API (SSE)                   │
│  - Processes tool calls                   │
└──────────┬───────────────────────────────┘
           │ (tool calls via IPC)
           ▼
┌──────────────────────────────────────────┐
│  Daemon: MCP Server                      │
│  (unchanged — no ort, no knowledge DB)   │
└──────────────────────────────────────────┘

D2: Knowledge Infrastructure Behind a Cargo Feature Flag

Two feature flags control the knowledge layer at different levels:

conductor-gui's knowledge feature (future): gates the optional conductor-knowledge crate dependency. When disabled, the GUI builds without any knowledge code.
conductor-knowledge's onnx feature: gates the ort dependency for real ONNX embeddings. When disabled (current default), stub embeddings are used.

Current and planned Cargo configuration:

# conductor-knowledge/Cargo.toml
[dependencies]
serde = { version = "1", features = ["derive"] }
# ort = { version = "2.0.0-rc.12", optional = true }  # Added when ONNX is wired

[features]
default = []
onnx = []  # Placeholder — change to onnx = ["dep:ort"] when ort is added

# conductor-gui/src-tauri/Cargo.toml (FUTURE — not yet added)
[dependencies]
conductor-knowledge = { path = "../../conductor-knowledge", optional = true }

[features]
default = ["custom-protocol", "plugin-registry", "knowledge"]
knowledge = ["dep:conductor-knowledge"]

Rationale:

Developers building the GUI without LLM features can disable knowledge and avoid the ONNX build dependency.
CI can build --no-default-features for faster test cycles where knowledge isn't under test.
The feature flag is default = true because the GUI is the LLM-enabled product.
The GUI frontend must detect the feature at runtime via a Tauri command (is_knowledge_available()) to conditionally show/hide knowledge UI elements.

D3: Dedicated `conductor-knowledge` Workspace Crate

The knowledge layer lives in its own workspace crate, not inside conductor-gui or conductor-daemon:

conductor-knowledge/                   ← New workspace crate
├── Cargo.toml
└── src/
    ├── lib.rs                         # KnowledgeService trait + InProcessKnowledge impl
    ├── index.rs                       # In-memory vector index, cosine similarity, stub_embed()
    ├── retrieval.rs                   # Pipeline: query → filter → embed → search → budget
    ├── chunker.rs                     # Section-aware document chunker for index builds
    ├── provider_tuning.rs             # Per-provider retrieval parameter adjustment
    ├── device_profiles.rs             # Built-in L2 device profiles
    └── community.rs                   # L3 community profile stub

# Future additions (not yet implemented):
# src/embedder.rs                      — ONNX model loading (when ort is wired)
# src/bin/conductor_knowledge.rs       — CLI: build-index, validate, inspect

The KnowledgeService trait:

/// Trait boundary that enables swapping in-process, sidecar, or remote implementations.
pub trait KnowledgeService: Send + Sync {
    /// Retrieve relevant knowledge chunks for a user query.
    fn retrieve_chunks(
        &self,
        query: &str,
        domain: Option<&str>,
        max_tokens: usize,
    ) -> RetrievedContext;

    /// Check if the knowledge service is available and healthy.
    fn is_available(&self) -> bool;
}

/// In-process implementation using in-memory KnowledgeIndex.
/// Default for desktop. Uses stub embeddings until ONNX is wired.
pub struct InProcessKnowledge { index: KnowledgeIndex }

/// Future: sidecar implementation using IPC to a separate process.
// pub struct SidecarKnowledge { /* ... */ }

Rationale:

The CLI tool (conductor-knowledge build) requires ort + rusqlite but has no dependency on Tauri, webview, or any GUI infrastructure. Placing it in conductor-gui would force the full Tauri build toolchain for a headless index builder.
The KnowledgeService trait decouples consumers (GUI) from the hosting model. The GUI calls service.retrieve() without knowing whether it runs in-process, in a sidecar, or via a remote API. This is the escape hatch for mobile, web, and server deployments.
Both the GUI (runtime retrieval) and the CLI (index building) depend on the same library code. A shared crate eliminates duplication.

Integration point in conductor-gui:

// In conductor-gui/src-tauri/src/chat/system_prompt.rs (future integration)
pub fn assemble_system_prompt(
    knowledge: Option<&dyn KnowledgeService>,
    config: &AppConfig,
    signals: &SignalContext,
    user_message: &str,
) -> String {
    let mut prompt = String::new();

    // L1: Static core reference (existing)
    prompt.push_str(&load_l1_reference());

    // L2: Retrieved knowledge chunks
    if let Some(svc) = knowledge {
        if svc.is_available() {
            let ctx = svc.retrieve_chunks(user_message, None, 1500);
            let formatted = conductor_knowledge::format_context(&ctx);
            if !formatted.is_empty() {
                prompt.push_str("\n\n");
                prompt.push_str(&formatted);
            }
        }
    }

    // T1/T2/T3: Runtime signals (existing)
    prompt.push_str(&format_signal_context(signals));

    prompt
}

D4: Daemon MCP Tools Remain Knowledge-Unaware

The daemon's MCP tools return structured data only. They do not inject knowledge context, suggest matchers based on device profiles, or perform any retrieval. The LLM receives the tool result as raw JSON and relies on its system prompt context (L1 + L2 injected by the GUI) to interpret and present the result intelligently.

Rationale: The daemon is a data service. The GUI is the intelligence layer. The LLM bridges them.

Exception: conductor_suggest_binding (ADR-022 Phase 5). The daemon tool returns the raw fingerprint classification ("CC-only traffic, channels 1, CC range 64-67, likely foot controller"). The GUI enriches this with L2 knowledge before presenting to the LLM. The tool stays knowledge-unaware.

Sequence diagram — ADR-022 Phase 6 device profile suggestion:

User: "Set up my new controller"
       │
       ▼
┌─────────────────────────────────────────────┐
│ GUI: assemble_system_prompt()               │
│  1. Inject L1 core reference                │
│  2. KnowledgeService::retrieve("controller")│
│     → returns device profile chunks from L2 │
│  3. Inject T1/T2/T3 signals                 │
│  4. Call LLM API with enriched context      │
└──────────────┬──────────────────────────────┘
               │ LLM calls conductor_suggest_binding
               ▼
┌─────────────────────────────────────────────┐
│ Daemon: conductor_suggest_binding           │
│  Returns: { category: "foot_controller",    │
│    cc_range: [64,67], confidence: 0.7 }     │
└──────────────┬──────────────────────────────┘
               │ raw fingerprint returned
               ▼
┌─────────────────────────────────────────────┐
│ GUI: Tool result received                   │
│  LLM already has L2 device profiles in      │
│  system prompt context — it can match       │
│  "foot_controller" to known devices.        │
│  No GUI-side enrichment step needed.        │
└─────────────────────────────────────────────┘

D5: L1 Stays in the Daemon (Unchanged)

L1 (Core Reference) is a static Markdown file (docs/llm-reference.md) shipped with the distribution. The GUI fetches L1 from the filesystem at startup and caches it for system prompt injection. The daemon also reads it for MCP tool descriptions and help text. L1 has no ONNX dependency — it's a text file. No change from ADR-018.

D6: L3 Online Retrieval Is GUI-Only

L3 (community knowledge, optional online retrieval) runs exclusively in the GUI process. It requires network access (already available via reqwest), user opt-in (settings UI is in the GUI), and privacy controls. The daemon has no involvement in L3. See Security Considerations for L3 privacy requirements.

D7: Model File Distribution

The ONNX model file (all-MiniLM-L6-v2.onnx, ~80MB) is downloaded on first launch with a Tauri progress UI:

First launch: GUI detects missing model, shows download progress dialog with cancel button. Model fetched from a project-hosted URL over HTTPS. SHA-256 checksum verified before use (see Security Considerations).
GUI installer (optional bundling): Distributors may bundle the model in the app resources directory for offline-first installs. This increases installer size from ~20MB to ~100MB.
Cargo build from source: Model is not downloaded during build. Fetched on first GUI launch. A CONDUCTOR_ONNX_MODEL_PATH env var allows overriding with a local file.
Daemon-only install: No model needed. The daemon doesn't depend on ort and doesn't load the model.
Fallback: If the model cannot be downloaded (firewall, offline, timeout), the GUI continues with L1 only. A status indicator shows "Knowledge: L1 only (model not available)".

The pre-built knowledge index (knowledge.db) is bundled with the GUI installer. Users can rebuild it via conductor-knowledge build if they add custom source documents.

Model size note: all-MiniLM-L6-v2 quantized to INT8 is ~23MB. gte-small is 33MB. If download size is a concern, a smaller or quantized model can be substituted — the KnowledgeService trait is model-agnostic.

D8: Model Loading Strategy

The ONNX model is loaded lazily on first knowledge-enabled request, not eagerly at GUI startup:

GUI starts without loading the model. Startup time is unaffected.
On the first chat message (if knowledge is enabled), the model loads asynchronously on a dedicated thread pool, separate from the Tauri async runtime. Typical load time: 1-3 seconds.
During loading, retrieval returns empty results (L1-only mode). A brief "Loading knowledge model..." status is shown.
Once loaded, the model session is cached for the lifetime of the GUI process.
If loading fails (corrupt model, OOM, unsupported platform), the GUI logs the error and continues with L1-only permanently for that session.

Threading: ONNX inference runs on a dedicated ort thread pool (SessionBuilder::with_inter_threads(2)), isolated from the Tauri async runtime and webview renderer. This prevents inference from blocking UI interactions.

D9: Index Lifecycle Management

Index schema version will be stored in the SQLite database metadata. The conductor-knowledge crate will define a SCHEMA_VERSION constant to track and enforce schema compatibility.
Compatibility check: On startup, if the index schema version doesn't match the library version, the GUI shows "Knowledge index outdated — rebuild required" and falls back to L1-only until rebuilt.
Embedding model identity: The index stores the model name and a hash of the model file used to generate embeddings. If the model changes, the index is incompatible (embeddings from different models are not comparable).
Concurrent access: The index uses SQLite WAL mode. The CLI (conductor-knowledge build) acquires a write lock during index builds. The GUI holds a read connection. If a rebuild is in progress while the GUI is running, the GUI continues reading the old index until the rebuild completes.
Rebuild triggers: Manual only (conductor-knowledge build). No automatic file watching or background rebuilding. Future: the GUI settings panel could offer a "Rebuild Index" button.

Specification: Migration from Current State

The knowledge code was migrated from conductor-daemon/src/daemon/knowledge/ in PR #768.

Completed (ADR-023 initial implementation):

Create conductor-knowledge/ workspace crate — Done
Move all 6 modules from daemon — Done (with provider_profile.rs → provider_tuning.rs rename)
Add KnowledgeService trait and InProcessKnowledge implementation — Done
Remove all knowledge imports from conductor-daemon — Done
Update workspace Cargo.toml — Done

Remaining (future PRs): 6. Add conductor-knowledge as optional dependency of conductor-gui (behind knowledge feature) 7. Wire KnowledgeService::retrieve_chunks() into GUI's system prompt assembly 8. Create CLI binary at conductor-knowledge/src/bin/conductor_knowledge.rs

Dependency Changes

# conductor-knowledge/Cargo.toml (NEW — matches actual crate)
[package]
name = "conductor-knowledge"
version.workspace = true
edition.workspace = true

[dependencies]
serde = { version = "1", features = ["derive"] }
# ort = { version = "2.0.0-rc.12", optional = true }  # Added when ONNX is wired

[features]
default = []
onnx = []  # Placeholder — change to onnx = ["dep:ort"] when ort is added

# conductor-gui/src-tauri/Cargo.toml (MODIFIED)
[dependencies]
conductor-knowledge = { path = "../../conductor-knowledge", optional = true }

[features]
default = ["custom-protocol", "plugin-registry", "knowledge"]
knowledge = ["dep:conductor-knowledge"]

# conductor-daemon/Cargo.toml (NO CHANGES)
# No ort dependency. No knowledge module. No conductor-knowledge dependency.

Specification: Impact on ADR-018

ADR-018 is amended as follows. These changes are additive — no existing decisions are reversed.

Amendment to D3.3 (Embedding Model)

Before: "The Rust backend loads it via ort at startup."

After: "The conductor-knowledge crate loads the model lazily on first retrieval request via ort, gated behind the onnx Cargo feature. The conductor-daemon process has no ort dependency and does not load the model."

Amendment to D3.4 (Index Storage)

Before: "Database location: $CONDUCTOR_DATA_DIR/knowledge/knowledge.db alongside knowledge.onnx"

After: Unchanged path. "The index is read by conductor-gui (via conductor-knowledge library) at runtime and by the conductor-knowledge CLI during index builds. The daemon does not access this file. The index uses SQLite WAL mode for concurrent read/write safety."

Amendment to D3.5 (Retrieval Pipeline)

Before: Pipeline described without specifying which process hosts it.

After: "The retrieval pipeline runs in the conductor-gui process via the conductor-knowledge library crate, integrated into the system prompt assembly path (see ADR-023 D3). It fires before each LLM API call. The daemon is not involved in retrieval."

Amendment to Phase 2A (GitHub #648)

Before: "Knowledge index infrastructure — ONNX embedding, SQLite storage, cosine search"

After: Infrastructure is implemented in the conductor-knowledge workspace crate, not in conductor-daemon. The ort dependency lives in conductor-knowledge's Cargo.toml. No changes to conductor-daemon's Cargo.toml.

Amendment to Phase 2B (GitHub #649)

Before: "Index build CLI — section-aware Markdown chunker and build pipeline"

After: The CLI binary (conductor-knowledge) lives in the conductor-knowledge crate as a primary binary, not in conductor-daemon or conductor-gui.

Specification: Impact on ADR-022

ADR-022 Phase 6 (LLM Integration) is unaffected in scope but clarified in deployment:

Phase 6A (binding-topology Canvas artifact): Canvas rendering is already GUI-side. No change.
Phase 6C (Device profile retrieval via L2): The retrieval happens in the GUI process via conductor-knowledge. The daemon's conductor_suggest_binding tool returns raw fingerprint data; the LLM uses L2 context already in the system prompt to interpret it (see D4 sequence diagram).
Phase 6D (Community profile sharing stub): L3 is GUI-only per D6.

Specification: Binary Size & Memory Impact

Disk Size

Binary	Before ADR-023	After ADR-023	Delta
conductor-daemon	~5MB	~5MB	0
conductor-gui (with knowledge)	~20MB	~70MB (+ort)	+50MB
conductor-gui (no knowledge)	~20MB	~20MB	0
conductor-knowledge CLI	N/A	~55MB	New binary
Model file (downloaded)	0	~80MB	+80MB (data dir, downloaded on first launch)
Knowledge index	0	~3MB	+3MB (data dir)

Total install size increase for GUI users: ~130MB (model downloaded on first launch). Total install size increase for daemon-only users: 0.

Resident Memory (RSS) — Peak Estimates

Component	Estimate	Notes
ONNX Runtime session	~100MB	Model weights + inference buffers
ONNX thread pool (2 threads)	~20MB	Dedicated inference threads
SQLite knowledge index	~5-10MB	Page cache, WAL
Total knowledge overhead	~130-150MB	On top of existing GUI baseline
GUI without knowledge	80-120MB	Tauri + webview + chat history
GUI with knowledge (peak)	210-270MB	During active inference
GUI with knowledge (idle)	~150-180MB	Model loaded, no active inference

Context: DAW workstations typically run 4-12GB of audio plugins. The knowledge layer's ~150MB idle footprint is significant but manageable on 8GB+ machines. On 4GB machines or memory-constrained environments, disable the knowledge feature.

Failure Modes & Degradation

The knowledge layer must never prevent the GUI from functioning. All failure modes degrade gracefully to L1-only operation.

Failure	Behavior	User-Visible
Model file missing	Skip L2 retrieval, L1-only	Status: "Knowledge: L1 only (model not available)"
Model download fails (network)	Retry on next launch, L1-only for this session	Download dialog shows error with retry option
Model file corrupt (checksum mismatch)	Refuse to load, delete corrupt file, L1-only	Status: "Knowledge: model verification failed"
ONNX Runtime init fails (unsupported platform)	Log error, L1-only permanently	Status: "Knowledge: not available on this platform"
Index missing or incompatible schema	Skip L2, L1-only	Status: "Knowledge: index rebuild required"
Index corrupt (SQLite error)	Delete index, L1-only until rebuilt	Log warning
Retrieval timeout (>500ms)	Cancel retrieval, proceed with L1-only for this request	No visible indicator (transparent fallback)
OOM during inference	Catch panic, unload model, L1-only for rest of session	Log error
ONNX Runtime segfault	Process crash (GUI restarts via Tauri)	Crash report; on next launch, disable knowledge auto-load, offer "try again"

Crash isolation note: The ort C++ runtime can segfault on malformed models or extreme memory pressure. Running it in-process means a crash takes down the GUI. This is acceptable for the initial desktop deployment because: (a) the model is integrity-verified before loading, (b) crashes are rare with verified models, (c) Tauri supports process restart. If crash frequency exceeds acceptable levels, escalate to sidecar (see Sidecar Escalation Criteria).

Security Considerations

Model Supply Chain (D7)

The ONNX model is an executable artifact — a tampered model could produce adversarial embeddings that bias retrieval results, constituting an indirect prompt injection vector.

Mitigations:

Integrity verification: A SHA-256 checksum of the official model file is embedded in the conductor-knowledge binary at compile time. The model is verified before every load. Checksum mismatch → refuse to load.
Download security: Model downloaded over HTTPS from a project-controlled URL. Certificate pinning is not required (HTTPS provides sufficient integrity for this threat model), but the URL is not user-configurable via UI — only via CONDUCTOR_ONNX_MODEL_PATH env var for advanced users.
CONDUCTOR_ONNX_MODEL_PATH risk: This env var allows loading an arbitrary model file, bypassing the embedded checksum. This is intentional (for development and custom models) but means an attacker who can set environment variables can swap the model. This is equivalent to the attacker already having code execution, so it does not expand the threat surface.

Knowledge Index at Rest (D9)

The SQLite knowledge index contains embeddings of device profiles, workflow patterns, and potentially user-specific configuration context. This constitutes a fingerprint of the user's studio setup.

Mitigations:

The index is stored in the user's data directory ($CONDUCTOR_DATA_DIR/knowledge/), protected by OS file permissions.
The index contains embeddings of reference documentation, not user conversations or personal data. User-specific data (chat history, API keys) is in separate databases.
Encryption at rest is not required for the initial deployment (the index content is derived from shipped documentation). Revisit if user-generated content (custom profiles, learned patterns) is added to the index.
The index is excluded from any future cloud sync or backup integration by default.

L3 Online Retrieval Privacy (D6)

When L3 online retrieval is enabled (opt-in only), queries are sent to an external service.

Requirements:

Data sanitization: Before any L3 network request, the query is stripped of: hardware serial numbers, internal IP addresses, file paths, environment variables, and device aliases. Only the semantic query text is transmitted.
Trust tiers: L1 = authoritative/trusted (shipped with binary). L2 = trusted (locally indexed from shipped content). L3 = external/untrusted (community-contributed, cached locally). L3 results are injected into the system prompt with a [Community — Unverified] prefix so the LLM can weight them appropriately.
No embedding vectors transmitted: L3 is strictly text-query-based — the hub is responsible for computing its own embeddings from the query text. Client-side APIs (e.g., the community module) must not accept or transmit client-computed embedding vectors for L3 lookup. The user's local embeddings never leave the machine. Note: the current community.rs stub accepts an embedding parameter for API symmetry; this will be changed to a text query when L3 is implemented.
User consent: L3 is disabled by default. Enabling it requires explicit opt-in via the Knowledge Sources settings panel, with a clear description of what data leaves the machine.

Daemon Tool Validation (D4)

The daemon's MCP tool validation (from ADR-007) must not trust GUI-provided context. A compromised knowledge index could theoretically bias the LLM into generating malicious tool calls (e.g., a poisoned L2 profile suggesting a dangerous shell command). The daemon's existing risk-tier validation (ConfigChange requires Plan/Apply, HardwareIO requires confirmation) is the defense. No additional measures are needed because the daemon validates tool parameters, not the reasoning that produced them.

Alternatives Considered

A1: ONNX in the Daemon

The ort dependency is added to conductor-daemon. The retrieval pipeline runs daemon-side.

Rejected because:

+130MB to the daemon binary affects all users
+150-200MB RSS on a process that should be 10-15MB
Daemon loses clean cross-compilation story
Daemon-only users (headless, no LLM) pay for LLM infrastructure
Violates the "daemon is a lightweight engine" design principle

A2: Separate `conductor-knowledge` Sidecar Process

A standalone binary that runs alongside the daemon and GUI. The GUI sends retrieval requests to it via IPC (Unix socket or stdin/stdout, managed by Tauri's sidecar system).

Deferred (not rejected). Revisit when:

ONNX crashes take down the GUI more than once per 1,000 sessions (crash isolation needed)
GUI RSS exceeds 300MB with knowledge loaded (memory isolation needed)
Tauri mobile target is pursued (can't bundle C++ ONNX on iOS/Android easily)
Third-party clients need knowledge without the GUI (independent lifecycle needed)
Index rebuild needs to run in the background while GUI is closed

The KnowledgeService trait (D3) makes this migration mechanical: implement SidecarKnowledge that wraps IPC calls to the sidecar process, swap the implementation at startup. No consumer code changes.

A3: API-Based Embedding (No Local ONNX)

Use an external embedding API (OpenAI text-embedding-3-small, etc.) instead of local ONNX inference.

Rejected because:

Requires network for every LLM message (breaks offline operation)
Ties the knowledge layer to a specific provider
Privacy: query text sent to embedding API provider
ADR-018 explicitly chose local embedding for privacy and cost reasons

A4: Feature-Gated ONNX in the Daemon (Optional)

Add ort to conductor-daemon behind an opt-in feature flag.

Rejected because:

Feature flags in the daemon create a matrix of supported configurations
The retrieval pipeline needs to integrate with system prompt assembly, which is GUI-side
If the daemon has the knowledge module, MCP tools would be expected to use it, creating implicit coupling

A5: WebAssembly-Based Inference in the Webview

Run onnxruntime-web or a Rust ML framework (candle) compiled to WASM inside the Tauri webview. This eliminates the native C++ dependency entirely.

Deferred (not rejected) because:

WASM SIMD support is inconsistent across webview engines
Inference is 3-10x slower than native ONNX Runtime
The ort Rust crate does not support WASM targets today
SQLite index access from WASM requires additional bridging (sql.js or IPC to Rust backend)
Revisit if: native ONNX distribution becomes untenable for cross-platform builds, or if a web-only deployment target is pursued

A6: Lexical Fallback (BM25/FTS5, No Embeddings)

Use SQLite FTS5 full-text search instead of semantic vector search. Eliminates the ONNX dependency entirely at the cost of retrieval quality.

Not adopted as primary, but available as fallback:

BM25 retrieval is valuable as a degraded-mode fallback when the ONNX model is unavailable (see Failure Modes)
The conductor-knowledge crate should implement both FtsRetrieval (BM25) and SemanticRetrieval (ONNX) behind the same KnowledgeService trait
On platforms where ONNX is unavailable (e.g., future WASM target), FTS5 provides baseline retrieval

Consequences

Positive

Daemon stays lightweight. 5MB binary, 10-15MB resident, clean cross-compilation. Open-source users get the full event pipeline without LLM overhead.
Single retrieval integration point. The GUI already assembles the system prompt. L2 retrieval inserts at one point in that pipeline. No new IPC protocols or cross-process coordination.
Build simplicity. Only the conductor-knowledge crate needs the ort dependency. The daemon builds with pure Rust dependencies. The GUI optionally pulls in conductor-knowledge.
Clean crate boundary. The KnowledgeService trait enables swapping in-process, sidecar, WASM, or remote implementations without changing consumers.

Negative

MCP tools can't use L2 knowledge directly. A tool like conductor_suggest_binding can't enrich its response with device profile data from L2. The LLM must use L2 context already present in the system prompt to interpret tool results. This is architecturally clean but means the LLM does the work of connecting tool output to knowledge context, adding cognitive load to the model.
Daemon-only users get no knowledge features. A headless setup running conductorctl commands via SSH has no L2/L3. L1 (static reference) is still available. If a future web UI, CLI chat, or VS Code extension connects directly to the daemon, it would need its own knowledge integration — the intelligence is not in the platform, it's in the client.
GUI memory pressure. The knowledge layer adds ~130-150MB RSS to the GUI process. On memory-constrained DAW workstations (8GB with plugins loaded), this is significant. Mitigated by lazy loading (D8) and the option to disable the knowledge feature.
Model distribution complexity. First-launch download requires async progress UI, retry logic, checksum verification, and error handling. This is a one-time UX cost but adds engineering effort to the release pipeline.
Update coupling. If the embedding model changes or the index schema evolves, users must update the GUI and potentially redownload the model. The daemon and GUI can no longer be updated fully independently when knowledge features are involved.

Neutral

L1 is unaffected. The static core reference is a text file read by both daemon and GUI.
The CLI tool in conductor-knowledge crate is architecturally clean. It shares library code with the GUI without requiring the Tauri build toolchain.

Future Considerations

Mobile (Tauri iOS/Android)

The ort crate's support for mobile targets (via CoreML on iOS, NNAPI on Android) is experimental. If Conductor targets mobile:

Option A: Disable the knowledge feature on mobile builds. Use L1-only or FTS5 fallback (A6).
Option B: Implement a MobileKnowledge variant of KnowledgeService using platform-native ML frameworks (Core ML, NNAPI) via the candle crate or direct FFI.
Option C: Use the API-based embedding option (A3) on mobile where network is typically available.

Web Deployment

If a web-based UI replaces or supplements Tauri:

The KnowledgeService trait allows a RemoteKnowledge implementation that calls a backend knowledge API.
The sidecar (A2) becomes a standalone knowledge server.

Multi-Model Future

If larger models are needed (e.g., for reranking), the KnowledgeService trait boundary isolates consumers from model changes. The sidecar escalation criteria (A2) should be re-evaluated if model RSS exceeds 300MB.

Implementation

This ADR requires migrating existing code and adding new infrastructure:

Create conductor-knowledge workspace crate (D3)
Migrate 6 modules from conductor-daemon/src/daemon/knowledge/ (see Migration section)
Implement KnowledgeService trait + InProcessKnowledge (D3)
Implement is_knowledge_available() Tauri command for frontend feature detection (D2)
Wire retrieval into system_prompt.rs via KnowledgeService (D1)
Implement model download + checksum verification (D7)
Implement lazy model loading on dedicated thread pool (D8)
Implement FTS5 fallback retrieval (A6 fallback path)
Remove all knowledge code from conductor-daemon
Update CI to build conductor-knowledge crate and test both --features onnx and --no-default-features

Review Checklist

Does the deployment boundary hold if the GUI is rewritten (e.g., web-based instead of Tauri)?
- Yes: the KnowledgeService trait allows RemoteKnowledge for web backends.
Does this prevent future daemon-side intelligence (e.g., proactive suggestions without GUI)?
- Partially: daemon-only mode with LLM support would need the sidecar (A2) or a daemon-embedded KnowledgeService.
Is the knowledge feature flag tested in CI?
- Must be: CI should build conductor-knowledge with both --features onnx and --no-default-features. CI should build conductor-gui with both --features knowledge and --no-default-features.
Is crash isolation sufficient for desktop?
- Acceptable for initial deployment with integrity-verified models. Monitor crash frequency; escalate to sidecar if >1 per 1,000 sessions.
Is the memory footprint acceptable for target users?
- Acceptable with lazy loading on 8GB+ machines. Document the --no-default-features escape hatch for constrained environments.
Are security considerations addressed?
- Model integrity: SHA-256 checksum. Index privacy: OS file permissions, no user data in initial index. L3: data sanitization, trust tiers, opt-in only.

Status​

Context​

Problem Statement​

Why This Matters​

Current Architecture (LLM Chat Data Flow)​

Decision​

D1: Knowledge Retrieval Runs in the Application Layer, Not the Daemon​

D2: Knowledge Infrastructure Behind a Cargo Feature Flag​

D3: Dedicated conductor-knowledge Workspace Crate​

D4: Daemon MCP Tools Remain Knowledge-Unaware​

D5: L1 Stays in the Daemon (Unchanged)​

D6: L3 Online Retrieval Is GUI-Only​

D7: Model File Distribution​

D8: Model Loading Strategy​

D9: Index Lifecycle Management​

Specification: Migration from Current State​

Dependency Changes​

Specification: Impact on ADR-018​

Amendment to D3.3 (Embedding Model)​

Amendment to D3.4 (Index Storage)​

Amendment to D3.5 (Retrieval Pipeline)​

Amendment to Phase 2A (GitHub #648)​

Amendment to Phase 2B (GitHub #649)​

Specification: Impact on ADR-022​

Specification: Binary Size & Memory Impact​

Disk Size​

Resident Memory (RSS) — Peak Estimates​

Failure Modes & Degradation​

Security Considerations​

Model Supply Chain (D7)​

Knowledge Index at Rest (D9)​

L3 Online Retrieval Privacy (D6)​

Daemon Tool Validation (D4)​

Alternatives Considered​

A1: ONNX in the Daemon​

A2: Separate conductor-knowledge Sidecar Process​

A3: API-Based Embedding (No Local ONNX)​

A4: Feature-Gated ONNX in the Daemon (Optional)​

A5: WebAssembly-Based Inference in the Webview​

A6: Lexical Fallback (BM25/FTS5, No Embeddings)​

Consequences​

Positive​

Negative​

Neutral​

Future Considerations​

Mobile (Tauri iOS/Android)​

Web Deployment​

Multi-Model Future​

Implementation​

Review Checklist​

Status

Context

Problem Statement

Why This Matters

Current Architecture (LLM Chat Data Flow)

Decision

D1: Knowledge Retrieval Runs in the Application Layer, Not the Daemon

D2: Knowledge Infrastructure Behind a Cargo Feature Flag

D3: Dedicated `conductor-knowledge` Workspace Crate

D4: Daemon MCP Tools Remain Knowledge-Unaware

D5: L1 Stays in the Daemon (Unchanged)

D6: L3 Online Retrieval Is GUI-Only

D7: Model File Distribution

D8: Model Loading Strategy

D9: Index Lifecycle Management

Specification: Migration from Current State

Dependency Changes

Specification: Impact on ADR-018

Amendment to D3.3 (Embedding Model)

Amendment to D3.4 (Index Storage)

Amendment to D3.5 (Retrieval Pipeline)

Amendment to Phase 2A (GitHub #648)

Amendment to Phase 2B (GitHub #649)

Specification: Impact on ADR-022

Specification: Binary Size & Memory Impact

Disk Size

Resident Memory (RSS) — Peak Estimates

Failure Modes & Degradation

Security Considerations

Model Supply Chain (D7)

Knowledge Index at Rest (D9)

L3 Online Retrieval Privacy (D6)

Daemon Tool Validation (D4)

Alternatives Considered

A1: ONNX in the Daemon

A2: Separate `conductor-knowledge` Sidecar Process

A3: API-Based Embedding (No Local ONNX)

A4: Feature-Gated ONNX in the Daemon (Optional)

A5: WebAssembly-Based Inference in the Webview

A6: Lexical Fallback (BM25/FTS5, No Embeddings)

Consequences

Positive

Negative

Neutral

Future Considerations

Mobile (Tauri iOS/Android)

Web Deployment

Multi-Model Future

Implementation

Review Checklist