Automated Market Discovery and Matching
2026-01-22 | ADR-017 Implementation
Implementation of automated market discovery between Polymarket and Kalshi while preserving human-in-the-loop safety.
The Problem
Manual market discovery doesn't scale:
- Discovery burden: Operators research markets on both platforms independently
- Missed opportunities: New markets go undetected
- No persistence: Mappings exist only in memory
- Scale limitation: Cannot monitor thousands of markets
Industry context: Research documented $40M+ in arbitrage profits from Polymarket alone (Apr 2024 - Apr 2025). Existing bots watch 10,000+ markets.
The Solution
Text similarity matching with semantic warnings and mandatory human approval.
Architecture
API Clients → Scanner (hourly) → Matcher → Candidates (SQLite) → Human Review → MappingManager
Matching Algorithm
- Pre-filter: Category, expiration (±7 days), outcome count
- Similarity:
0.6 × Jaccard(tokens) + 0.4 × Levenshtein_normalized - Threshold: Score ≥ 0.6 creates candidate for review
- Warnings: Flag settlement differences (announcement vs actual event)
Safety Architecture (FR-MD-003)
The critical constraint: settlement semantics differ across platforms.
Example - 2024 Government Shutdown:
- Polymarket: "OPM issues shutdown announcement"
- Kalshi: "Actual shutdown exceeding 24 hours"
Same event, different resolution criteria, potentially different outcomes.
Safety Gates
pub fn approve(&self, id: Uuid, acknowledge_warnings: bool) -> Result<(), ApprovalError> {
let candidate = self.storage.get_candidate(id)?;
// Safety: Require warning acknowledgment
if !candidate.semantic_warnings.is_empty() && !acknowledge_warnings {
return Err(ApprovalError::WarningsNotAcknowledged);
}
// Use existing safety gate (FR-MD-003)
let mut manager = self.mapping_manager.lock().unwrap();
let mapping_id = manager.propose_mapping(/*...*/);
manager.verify_mapping(mapping_id);
// Audit log for compliance
self.storage.log_decision(/*...*/)?;
Ok(())
}
What This Guarantees
- Human-in-the-loop: Candidates require explicit approval
- FR-MD-003 enforced: Uses existing
MappingManager.verify_mapping() - Semantic warnings block quick approval: Must acknowledge settlement differences
- Audit trail: All approvals/rejections logged
Implementation Highlights
Feature Flag
Discovery is opt-in via Cargo feature:
[features]
discovery = ["dep:strsim"]
CLI Interface
# Discover and match markets
cargo run --features discovery -- --discover-markets
# Review candidates interactively
cargo run --features discovery -- --review-candidates
# List pending candidates
cargo run --features discovery -- --list-candidates --status pending
# Batch operations
cargo run --features discovery -- --approve-candidates --ids "uuid1,uuid2"
cargo run --features discovery -- --reject-candidates --ids "uuid1" --reason "Settlement differs"
Similarity Scorer
pub struct SimilarityScorer {
jaccard_weight: f64, // 0.6
levenshtein_weight: f64, // 0.4
threshold: f64, // 0.6
}
impl SimilarityScorer {
pub fn find_matches(&self, market: &DiscoveredMarket, candidates: &[DiscoveredMarket])
-> Vec<CandidateMatch>
{
candidates.iter()
.filter(|c| self.pre_filter(market, c))
.filter_map(|c| {
let score = self.combined_score(&market.title, &c.title);
if score >= self.threshold {
Some(CandidateMatch::new(market.clone(), c.clone(), score))
} else {
None
}
})
.collect()
}
}
Scanner Actor
impl Actor for DiscoveryScannerActor {
type Message = ScannerMsg;
async fn handle(&mut self, message: Self::Message) -> Result<(), ActorError> {
match message {
ScannerMsg::Scan => {
let poly_markets = self.fetch_all_markets(&*self.polymarket_client).await?;
let kalshi_markets = self.fetch_all_markets(&*self.kalshi_client).await?;
// Store markets
for market in &poly_markets {
self.storage.lock().await.upsert_market(market)?;
}
// Find candidates
for poly_market in &poly_markets {
let matches = self.scorer.find_matches(poly_market, &kalshi_markets);
for candidate in matches {
if !self.is_duplicate_candidate(&candidate).await? {
self.storage.lock().await.insert_candidate(&candidate)?;
}
}
}
}
// ...
}
}
}
Test Coverage
48 tests across 5 phases:
| Module | Tests |
|---|---|
| candidate.rs | 5 |
| storage.rs | 7 |
| normalizer.rs | 3 |
| matcher.rs | 7 |
| polymarket_gamma.rs | 4 |
| kalshi_markets.rs | 4 |
| scanner.rs | 5 |
| approval.rs | 5 |
| CLI integration | 8 |
Council Review
All 5 phases passed LLM Council review with confidence >= 0.87.
Final ADR Review:
- Verdict: PASS
- Confidence: 0.88
- Weighted Score: 8.55/10
Safety gates (FR-MD-003) received "PASS (Strong)" verdict.
Why Text Similarity Over LLM/Embeddings
Options considered:
| Approach | Accuracy | Cost | Latency |
|---|---|---|---|
| Text similarity | Moderate | Zero | Sub-ms |
| LLM verification | High | $0.01-0.05/call | +200-500ms |
| Embeddings | Highest | Storage + compute | Batch dependent |
Text similarity was selected because:
- Sufficient for MVP: Catches majority of matches
- Zero dependencies: No external API costs
- Extensible: LLM verification can be added later
- Council compliant: "Suggestion engine only" per Design Review 1
Update: Post-Implementation Learnings (2026-01-23)
Post-implementation testing revealed a critical gap: text similarity is insufficient for production.
The Problem
Real market pairs score only 8-9% similarity despite semantic equivalence:
| Kalshi | Polymarket | Jaccard |
|---|---|---|
| "Will Trump buy Greenland?" | "Will the US acquire part of Greenland in 2026?" | 8.3% |
| "Will Washington win the 2026 Pro Football Championship?" | "Super Bowl Champion 2026" | 9.1% |
Root causes:
- Different vocabulary: "Super Bowl" vs "Pro Football Championship"
- Different framing: Question vs statement
- Different specificity: Team name vs championship event
The Solution: 5-Phase Approach
We've extended ADR-017 with a progressive enhancement roadmap:
Phase 1: Text Similarity ← Current (MVP, 8-9% accuracy on hard pairs)
Phase 2: Fingerprint Matching ← Proposed (entity extraction, field-weighted scoring)
Phase 3: Embedding Matching ← Proposed (semantic similarity via vectors)
Phase 4: LLM Verification ← Proposed (human-level reasoning for uncertain cases)
Phase 5: Human Feedback Loop ← Proposed (continuous improvement from decisions)
Phase 3: Embedding-Based Semantic Matching
Embeddings capture semantic similarity that text matching misses:
# "Super Bowl" and "Pro Football Championship" have zero word overlap
# but high embedding similarity
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
emb1 = model.encode("Super Bowl Champion 2026")
emb2 = model.encode("2026 Pro Football Championship winner")
similarity = cosine_similarity(emb1, emb2) # ~0.85
New requirements: FR-MD-018 through FR-MD-023
Phase 4: LLM Verification
For uncertain matches (0.60-0.85 score), invoke LLM for human-level reasoning:
Candidate pair for verification:
Market A (Kalshi): "Will the US acquire part of Greenland in 2026?"
Market B (Polymarket): "Will Trump buy Greenland?"
Analyze: Are these the same underlying event?
Consider: Resolution criteria, timing, specificity
Cost optimization: Haiku screening ($0.001/call), Sonnet escalation ($0.01/call) Budget: ~$50/day for 5,000 candidates
New requirements: FR-MD-024 through FR-MD-027
Phase 5: Learning from Human Feedback (Data Flywheel)
The key innovation: human approval decisions are training data.
┌─────────────────────────────────────────────────────────────┐
│ Data Flywheel: Human Decisions Train Models │
├─────────────────────────────────────────────────────────────┤
│ Human Approval ──► Entity Alias Learning │
│ ("Super Bowl" = "Pro Football Championship")
│ │
│ Human Approval ──► Embedding Fine-Tuning │
│ (contrastive learning on approved pairs) │
│ │
│ Human Approval ──► Weight Optimization │
│ (logistic regression on decision history)│
└─────────────────────────────────────────────────────────────┘
Weekly improvement cycle:
- Monday: Export new decisions, update golden set
- Tuesday: Retrain embedding model, optimize weights
- Wednesday: Validate on golden set
- Thursday-Saturday: A/B test (10% traffic)
- Sunday: Promote if improved, rollback if degraded
New requirements: FR-MD-028 through FR-MD-032
Council Review
The Phase 3-5 extension passed council review:
| Dimension | Score |
|---|---|
| Accuracy | 8.5 |
| Completeness | 9.0 |
| Clarity | 8.5 |
| Conciseness | 7.5 |
| Relevance | 9.0 |
Verdict: PASS (confidence 0.87, weighted score 8.5)
What This Means
The safety architecture remains unchanged: human-in-the-loop is mandatory (FR-MD-003). But now each human decision improves future matching, creating a virtuous cycle where accuracy improves over time with minimal additional effort.
