Market Discovery Phase 1: Foundation Types and Storage
This post covers Phase 1 of ADR-017 (Automated Market Discovery and Matching) - establishing the data types and persistence layer for the discovery system.
The Problem
Manual market mapping is error-prone and doesn't scale. Polymarket and Kalshi list hundreds of markets; finding equivalent pairs requires:
- Persistent storage - Track discovered markets across restarts
- Status tracking - Pending → Approved/Rejected workflow
- Audit trail - Record all approval decisions for compliance
- Safety gates - Prevent automated trading without human review
Design Decisions
CandidateStatus State Machine
The core safety mechanism is a one-way state machine:
Pending ──┬──► Approved
│
└──► Rejected
Once a candidate is approved or rejected, the status is immutable. This prevents accidental re-processing or status manipulation:
impl CandidateStatus {
pub fn can_transition_to(&self, new_status: CandidateStatus) -> bool {
match (self, new_status) {
(CandidateStatus::Pending, CandidateStatus::Approved) => true,
(CandidateStatus::Pending, CandidateStatus::Rejected) => true,
// Once approved or rejected, status is final
(CandidateStatus::Approved, _) => false,
(CandidateStatus::Rejected, _) => false,
_ => false,
}
}
}
Semantic Warnings
Markets that appear similar may have different settlement criteria. The CandidateMatch struct includes a semantic_warnings field that Phase 2's matcher will populate:
pub struct CandidateMatch {
pub semantic_warnings: Vec<String>, // e.g., "Settlement timing differs"
// ...
}
Approval will require explicit acknowledgment of these warnings (FR-MD-003).
SQLite Storage
We chose SQLite over PostgreSQL for the discovery cache because:
- Single-tenant - Discovery runs locally per operator
- Portable - No external dependencies for development
- Atomic - Transactions prevent partial state
Schema design separates markets from candidates:
-- Discovered markets (one per platform/id combination)
CREATE TABLE discovered_markets (
id TEXT PRIMARY KEY,
platform TEXT NOT NULL,
platform_id TEXT NOT NULL,
title TEXT NOT NULL,
-- ...
UNIQUE(platform, platform_id)
);
-- Candidate matches (references two markets)
CREATE TABLE candidates (
id TEXT PRIMARY KEY,
polymarket_id TEXT NOT NULL,
kalshi_id TEXT NOT NULL,
similarity_score REAL NOT NULL,
status TEXT NOT NULL DEFAULT 'Pending',
-- ...
);
-- Audit log for compliance
CREATE TABLE audit_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
action TEXT NOT NULL,
candidate_id TEXT NOT NULL,
details TEXT NOT NULL -- Full JSON context
);
Parameterized Queries
All SQL uses the params![] macro to prevent injection:
conn.execute(
"UPDATE candidates SET status = ?1, updated_at = ?2 WHERE id = ?3",
params![status_str, now, id.to_string()],
)?;
Test Coverage
Phase 1 includes 12 tests covering:
| Module | Tests | Focus |
|---|---|---|
candidate.rs | 5 | Type creation, status transitions, serialization |
storage.rs | 7 | CRUD operations, filtering, audit logging |
Key safety test:
#[test]
fn test_candidate_status_transitions() {
// Once approved, cannot transition to any other status
assert!(!CandidateStatus::Approved.can_transition_to(CandidateStatus::Pending));
assert!(!CandidateStatus::Approved.can_transition_to(CandidateStatus::Rejected));
}
What's Next
Phase 2 will implement the text matching engine:
TextNormalizer- Lowercase, remove punctuation, tokenizeSimilarityScorer- Jaccard (0.6 weight) + Levenshtein (0.4 weight)- Semantic warning detection for settlement differences
Council Review
Phase 1 passed council verification with confidence 0.88. Key findings:
- ✅ Human-in-the-loop enforced via CandidateStatus state machine
- ✅ Audit logging captures all required fields
- ✅ No SQL injection (all parameterized queries)
- ✅ No unsafe code
Implementation: arbiter-engine/src/discovery/ | Issues: #41, #42 | ADR: 017
