Blog | amiable.dev

Eliminating Request Waterfalls: Parallel Data Fetching for SRE Dashboards

February 10, 2026 · 5 min read

Published: 2025-01-16

When an SRE responds to an incident, every second counts. Yet our dashboard was making them wait - not because the backend was slow, but because we'd accidentally created a request waterfall that serialized all our data loading. Here's how we fixed it.

The Problem

Our React application followed a common but flawed pattern: lazy load the component code, render it, then fetch data. This creates what's known as a "request waterfall":

User Authenticates
  → Component Code Loads (100ms)
    → Component Renders
      → useQuery fires (network latency ~200ms)
        → Data arrives
          → Re-render with data

For our SRE Dashboard, this meant loading 6 different data sources sequentially:

Dashboard summary
Health scores
Top issues
Applications list
Active incidents
SLO status

Each waited for the previous to complete. A 50ms backend response became a 300ms waterfall when chained.

The LLM Council review of ADR-036 (Lazy Loading) caught this flaw:

"The lazy loading strategy is fundamentally flawed. Code → Render → Data pattern degrades performance (sequential instead of parallel)."

The verdict was REJECTED. We needed to fix this.

The Solution

The fix is conceptually simple: fetch data in parallel with component code, not after.

User Authenticates
  ├── Component Code Loads (100ms)     ← PARALLEL
  └── preloadCriticalData() fires      ← PARALLEL
      ├── /sre-dashboard
      ├── /health-score
      ├── /applications
      ├── /incidents
      ├── /slos
      └── /error-budgets
  → Component Renders with data already in cache

We implemented this with three key pieces:

1. Critical Path Manifest

First, we classified routes by urgency:

// frontend/src/utils/preload.ts
export const CRITICAL_ROUTES = {
  // Immediate - User likely to visit first
  IMMEDIATE: ['/sre-dashboard', '/incidents'],

  // Preload - User likely to visit soon after
  PRELOAD: ['/slos', '/slis', '/error-budgets', '/runbooks'],

  // Lazy - Only load when navigating
  LAZY: ['/settings', '/admin/*', '/analytics/*', '/reports/*'],
} as const;

This tells us what to preload aggressively vs. what can wait.

2. Shared Query Options

We created reusable query options that both the preloader and components use:

// frontend/src/routes/loaders/sreLoaders.ts
export const queryOptions = {
  sreDashboard: (applicationFilter: string = 'all') => ({
    queryKey: ['sre-dashboard', applicationFilter],
    queryFn: async () => {
      const response = await apiClient.get(`/sre-dashboard?${params}`);
      return response.data;
    },
    staleTime: 30000, // Consider fresh for 30 seconds
  }),

  healthScore: (applicationFilter: string = 'all') => ({
    queryKey: ['sre-health-score', applicationFilter],
    queryFn: async () => {
      const response = await apiClient.get(`/sre-dashboard/health-score?${params}`);
      return response.data;
    },
    staleTime: 30000,
  }),
  // ... more query options
};

The key insight: by sharing queryKey and queryFn between preloaders and components, React Query automatically deduplicates requests and shares the cache.

3. Authentication-Triggered Preloading

We created a hook that fires preloading immediately after authentication:

// frontend/src/hooks/usePreloadCriticalData.ts
export function usePreloadCriticalData(options: { enabled?: boolean } = {}) {
  const queryClient = useQueryClient();
  const preloadedRef = useRef(false);

  useEffect(() => {
    if (!options.enabled || preloadedRef.current) return;
    preloadedRef.current = true;

    // Fire all preloads in parallel
    preloadCriticalData(queryClient);
  }, [options.enabled, queryClient]);
}

The preloadedRef ensures we only preload once per session, even if the user navigates between routes.

4. App Integration

Finally, we integrated the hook into App.tsx:

function App() {
  const { user } = useAuth();

  // Preload critical SRE data after authentication (Issue #452)
  usePreloadCriticalData({ enabled: !!user });

  // ... rest of app
}

Implementation Details

The preload function fires 6 parallel requests using Promise.all:

export async function preloadCriticalData(queryClient: QueryClient): Promise<void> {
  console.log('[Preload] Starting critical data preload...');
  const startTime = performance.now();

  try {
    await Promise.all([
      queryClient.prefetchQuery(queryOptions.sreDashboard('all')),
      queryClient.prefetchQuery(queryOptions.healthScore('all')),
      queryClient.prefetchQuery(queryOptions.topIssues('all')),
      queryClient.prefetchQuery(queryOptions.applications()),
      queryClient.prefetchQuery(queryOptions.incidents('Active')),
      queryClient.prefetchQuery(queryOptions.slos()),
      queryClient.prefetchQuery(queryOptions.errorBudgets()),
    ]);

    const elapsed = performance.now() - startTime;
    console.log(`[Preload] Critical data loaded in ${elapsed.toFixed(0)}ms`);
  } catch (error) {
    // Don't throw - preloading is best-effort
    console.warn('[Preload] Failed to preload some critical data:', error);
  }
}

Note the error handling: preloading is best-effort. If some requests fail, the app still works - the component's useQuery will fetch the data normally.

Results

The performance improvement is significant:

Metric	Before	After	Improvement
Dashboard initial load	~800ms	~300ms	62% faster
Subsequent navigation	~200ms	<50ms	Near-instant
Network requests	Sequential	Parallel	6x concurrency

More importantly, the user experience improved:

SRE Dashboard renders with data immediately after login
Navigation between critical routes feels instant
Cache remains valid for 30 seconds, reducing redundant requests

E2E Testing

We added comprehensive E2E tests to verify the parallel loading behavior:

test('should preload critical API data after authentication', async ({ page }) => {
  const apiRequests: string[] = [];

  page.on('request', (request) => {
    if (request.url().includes('/api/')) {
      apiRequests.push(request.url());
    }
  });

  // Login and wait for preloading
  await page.goto(`${BASE_URL}/login`);
  await page.fill('input[name="email"]', 'admin@example.com');
  await page.fill('input[name="password"]', 'admin123');
  await page.click('button:has-text("Sign In")');
  await page.waitForURL(`${BASE_URL}/sre-dashboard`);
  await page.waitForTimeout(2000);

  // Verify critical endpoints were called
  const criticalEndpoints = ['sre-dashboard', 'health-score', 'applications', 'incidents', 'slos'];
  const foundEndpoints = criticalEndpoints.filter((endpoint) =>
    apiRequests.some((url) => url.includes(endpoint))
  );

  expect(foundEndpoints.length).toBeGreaterThanOrEqual(4);
});

Lessons Learned

Lazy loading isn't always the answer. Sometimes it introduces worse problems than it solves. The code → render → data waterfall is a classic trap.
LLM Council reviews catch architectural issues. The REJECTED verdict on ADR-036 forced us to think harder about the performance implications.
React Query's cache is powerful. By sharing query options between preloaders and components, we get automatic deduplication and cache sharing.
Best-effort preloading is resilient. If preloading fails, the app still works. This makes the feature safe to deploy.
Critical path thinking matters. Not all routes need instant loading. Categorizing by urgency lets us focus resources where they matter most.

This post details the implementation of ADR-036 Issue #452 fix.

From False Positives to Precision: Weighted Duplicate Scoring

February 10, 2026 · 5 min read

SRE Operations Platform

Published: 2025-01-16

Duplicate detection sounds simple: find things that are similar. But when your system uses multiple detection strategies, merging their results becomes the hard part. Our original implementation used a simple extend() pattern that created more problems than it solved. Here's how weighted scoring fixed it.

The Problem

Our duplicate detection system used three complementary strategies:

Strategy	Strength	Weakness
Vector	Catches semantic similarity	Expensive (embedding generation)
Text	Catches near-exact wording	Misses paraphrases
Keyword	Fast topic matching	High false positive rate

The original merge logic was straightforward:

def _merge_results(main_results, new_results, method):
    """Just extend the lists"""
    for level in ["high_confidence", "medium_confidence", "low_confidence"]:
        if level in new_results:
            for item in new_results[level]:
                if item["id"] not in seen_ids:
                    main_results[level].append(item)

The problem? This treats all strategies as an unconditional OR. If vector says 0.85 similarity and keyword says 0.60 similarity for the same item, which wins?

With extend(), both get added - or worse, only the first one encountered is kept. We lose the precision that comes from multiple strategies agreeing.

The LLM Council review caught this immediately:

"Simple extend treats all strategies as OR → 'High CPU on DB-01' and 'High CPU on DB-02' incorrectly merged as duplicates. Vector catches semantic similarity; fuzzy text catches similar hostnames → distinct incidents merged."

The verdict was REQUEST CHANGES. We needed weighted scoring.

The Solution

The fix required two conceptual shifts:

Weighted aggregation instead of simple append
Multi-strategy agreement as a confidence signal

Weighted Scoring Algorithm

We combine strategy scores with configurable weights:

@dataclass
class WeightedScoringConfig:
    vector_weight: float = 0.5   # Semantic similarity is most reliable
    text_weight: float = 0.3    # Text matching is secondary
    keyword_weight: float = 0.2  # Keyword is least precise
    min_strategies_for_high: int = 2

def calculate_weighted_score(strategy_scores: dict) -> float:
    """
    Score = (Vector * 0.5) + (Text * 0.3) + (Keyword * 0.2)
    Normalizes if some strategies are missing.
    """
    weights = {
        "vector": self.weighted_config.vector_weight,
        "text": self.weighted_config.text_weight,
        "keyword": self.weighted_config.keyword_weight,
    }

    total_weight = 0.0
    weighted_sum = 0.0

    for strategy, score in strategy_scores.items():
        if strategy in weights:
            weight = weights[strategy]
            weighted_sum += score * weight
            total_weight += weight

    # Normalize if not all strategies were used
    if total_weight > 0:
        return weighted_sum / total_weight
    return 0.0

This means if only vector and text detect a match, we normalize: (0.90 * 0.5 + 0.80 * 0.3) / (0.5 + 0.3) = 0.875.

Multi-Strategy Agreement

A single strategy match should never produce high confidence - too much risk of false positives:

def determine_confidence_level(result, min_strategies_for_high=2):
    """
    High confidence requires multiple strategies to agree.
    Single-strategy matches can only be medium or low.
    """
    strategies_matched = result.get("strategies_matched", 0)
    weighted_score = self.calculate_weighted_score(result.get("scores", {}))

    # Multi-strategy agreement required for high confidence
    if strategies_matched >= min_strategies_for_high and weighted_score >= 0.80:
        return "high"
    elif strategies_matched >= 1 and weighted_score >= 0.65:
        return "medium"
    else:
        return "low"

This simple change dramatically reduces false positives. Even a 0.95 vector similarity can't produce "high" confidence alone.

Entity-Specific Thresholds

Different entity types need different sensitivity:

ENTITY_THRESHOLDS = {
    "incidents": EntityThresholds(
        high_confidence=0.95,    # Strict - don't suppress real alarms
        medium_confidence=0.85,
        low_confidence=0.75,
    ),
    "runbooks": EntityThresholds(
        high_confidence=0.80,    # Loose - find helpful content
        medium_confidence=0.70,
        low_confidence=0.60,
    ),
    "requirements": EntityThresholds(
        high_confidence=0.85,    # Default - balanced
        medium_confidence=0.75,
        low_confidence=0.65,
    ),
}

For incidents, we want almost-exact matches only (0.95). False negatives are better than suppressing real alarms. For runbooks, we're more permissive (0.80) because finding helpful related content is the goal.

Cascading Execution

We also optimized for performance by running cheap strategies first:

def get_strategy_execution_order() -> list[str]:
    return ["keyword", "text", "vector"]  # Cheapest first

Keyword matching is just string comparison - nearly free. Vector requires embedding generation or lookup - expensive. By running keyword first, we can potentially skip the expensive vector check if keyword already disqualifies the match.

Implementation

The updated _merge_results now tracks per-strategy scores:

def _merge_results(main_results, new_results, method):
    """Merge with weighted scoring instead of simple extend."""

    # Initialize tracking dict
    if "_item_scores" not in main_results:
        main_results["_item_scores"] = {}

    for level in ["high_confidence", "medium_confidence", "low_confidence"]:
        for item in new_results.get(level, []):
            item_id = item["id"]

            # Track per-strategy scores for this item
            if item_id not in main_results["_item_scores"]:
                main_results["_item_scores"][item_id] = {
                    "item_data": item,
                    "strategy_scores": {},
                    "strategies_matched": 0,
                }

            # Add score for this strategy
            score = extract_score_by_method(item, method)
            main_results["_item_scores"][item_id]["strategy_scores"][method] = score
            main_results["_item_scores"][item_id]["strategies_matched"] += 1

    # Recategorize based on weighted scoring
    self._recategorize_with_weighted_scoring(main_results)

The key insight: we don't decide the confidence level when processing each strategy. We wait until all strategies have contributed their scores, then calculate the weighted result.

Results

The weighted scoring approach provides:

Metric	Before	After
False Positive Rate	High (~30%)	Low (~5%)
Multi-strategy matches	Not tracked	Prioritized
Entity-specific tuning	None	Full support
Debugging info	Lost	Preserved

Each result now includes full strategy breakdown:

{
  "id": "REQ-001",
  "weighted_score": 0.875,
  "confidence_level": "high",
  "strategies_matched": 2,
  "strategy_scores": {
    "vector": 0.90,
    "text": 0.85
  }
}

This makes debugging straightforward: you can see exactly which strategies contributed and with what scores.

Lessons Learned

Simple OR logic loses precision - When merging results from multiple strategies, preserve the individual contributions.
Agreement is a signal - Two strategies detecting the same match is much stronger evidence than one strategy with a high score.
Different entities need different thresholds - What's "similar enough" for runbooks is too loose for incidents.
Preserve debugging info - The per-strategy scores are invaluable for tuning thresholds and investigating false positives.
Order matters for performance - Run cheap strategies first to enable early termination.

This post details the implementation of ADR-038: Duplicate Detection weighted scoring improvements.

Entity Relationship Primary Key Fix: Preventing Silent Data Corruption

February 10, 2026 · 3 min read

SRE Operations Platform

Published: 2025-01-16

When the LLM Council reviewed our entity relationships implementation (ADR-026), they flagged a critical flaw: our 4-column primary key blocked multiple relationship types between the same entities, and duplicate relationships could silently corrupt traceability data.

This post details how Issue #458 fixed both data integrity gaps with a proper 5-column primary key.

The Problem

Our EntityRelationship table connects any two entities with typed relationships:

-- Original schema (simplified)
CREATE TABLE entity_relationships (
    source_type VARCHAR(50),
    source_id VARCHAR(255),
    target_type VARCHAR(50),
    target_id VARCHAR(255),
    relationship_type VARCHAR(50),
    PRIMARY KEY (source_type, source_id, target_type, target_id)
    -- Notice: relationship_type NOT in primary key!
);

The 4-column primary key created two problems:

Problem 1: Blocked Multiple Relationship Types

-- Insert #1: REQ-001 depends on REQ-002
INSERT INTO entity_relationships
VALUES ('requirement', 'REQ-001', 'requirement', 'REQ-002', 'depends_on');
-- SUCCESS!

-- Insert #2: REQ-001 also references REQ-002
INSERT INTO entity_relationships
VALUES ('requirement', 'REQ-001', 'requirement', 'REQ-002', 'references');
-- FAILS! PK violation (same 4-column key)

This was fundamentally broken—entities couldn't have multiple relationship types!

Problem 2: Potential Duplicate Relationships

Without proper constraints, race conditions or UI bugs could create duplicate relationships.

Consequences

Feature Blockage: Can't express "depends_on AND references" relationships
Graph Pollution: (if duplicates existed) Wrong edge counts
IEEE 29148-2018 Non-Compliance: Traceability requires rich relationships

The Solution

Change the primary key to 5 columns, including relationship_type:

# backend/models/generic_relationships.py
class EntityRelationship(Base):
    __tablename__ = "entity_relationships"

    # 5-column composite primary key
    source_type = Column(String(50), primary_key=True)
    source_id = Column(String(255), primary_key=True)
    target_type = Column(String(50), primary_key=True)
    target_id = Column(String(255), primary_key=True)
    relationship_type = Column(String(50), primary_key=True)  # Now in PK!

Key Insight: Multiple Relationship Types Are Now Valid

With the 5-column PK:

REQ-001 → REQ-002 with depends_on ✓
REQ-001 → REQ-002 with references ✓ (different relationship type = different row)
REQ-001 → REQ-002 with depends_on again ✗ (exact duplicate blocked by PK)

Migration Strategy

The migration handles the schema change safely:

def upgrade():
    # Step 1: Remove any duplicates (keep newest by updated_at)
    conn.execute(text("""
        DELETE FROM entity_relationships
        WHERE ctid NOT IN (
            SELECT DISTINCT ON (source_type, source_id, target_type, target_id, relationship_type)
                   ctid
            FROM entity_relationships
            ORDER BY source_type, source_id, target_type, target_id, relationship_type,
                     updated_at DESC NULLS LAST
        )
    """))

    # Step 2: Drop 4-column primary key
    op.drop_constraint("entity_relationships_pkey", "entity_relationships", type_="primary")

    # Step 3: Create 5-column primary key
    op.create_primary_key(
        "entity_relationships_pkey",
        "entity_relationships",
        ["source_type", "source_id", "target_type", "target_id", "relationship_type"],
    )

Testing

Our TDD tests verify the constraint at the database level:

def test_duplicate_relationship_rejected_at_database_level(self, postgres_db):
    """Exact duplicate 5-tuple MUST raise IntegrityError."""
    rel1 = EntityRelationship(
        source_type="requirement", source_id="REQ-DUP-001",
        target_type="capability", target_id="CAP-DUP-001",
        relationship_type="depends_on",
    )
    postgres_db.add(rel1)
    postgres_db.flush()

    # Attempt exact duplicate
    rel2_duplicate = EntityRelationship(
        source_type="requirement", source_id="REQ-DUP-001",
        target_type="capability", target_id="CAP-DUP-001",
        relationship_type="depends_on",  # Same!
    )
    postgres_db.add(rel2_duplicate)

    with pytest.raises(IntegrityError):
        postgres_db.flush()  # MUST fail

Impact

Before	After
Duplicates silently accepted	`IntegrityError` on duplicate
Graph traversal counts edges incorrectly	Clean graph structure
UI shows duplicates	No duplicates possible
ADR-026: CONDITIONAL	ADR-026: APPROVED

Lessons Learned

Composite Primary Keys Need Review: Adding a column to a table doesn't mean it's part of the uniqueness guarantee
Semantic Differences Matter: "depends_on" and "references" are different relationships—the fix preserves this distinction
Database Constraints > Application Validation: Race conditions can bypass application checks; database constraints are authoritative
Migration Must Handle Existing Data: Don't just add constraints—clean up legacy data first

Issue #458 | ADR-026 | LLM Council Blocking Issue Resolved

MCP Bidirectional Traffic: Fixing SSE Buffering and Rate Limits

February 10, 2026 · 4 min read

SRE Operations Platform

Published: 2025-01-16

When the LLM Council reviewed our MCP client proxy (ADR-040), they identified a critical gap: our nginx configuration was buffering SSE responses, causing tool execution to hang. Additionally, our standard API rate limits (60 req/min) were breaking MCP negotiation, which is inherently chatty.

This post details how Issue #460 fixed these SSE streaming issues.

The Problem

Problem 1: nginx Buffering Blocks SSE

Default nginx proxy configuration buffers responses:

# Default behavior (problematic for SSE)
location /api/ {
    proxy_pass http://backend;
    # proxy_buffering is ON by default!
}

When SSE events are buffered, they arrive in bursts instead of real-time. For MCP:

Tool execution appears to hang for seconds
Timeouts during long-running operations
Poor user experience in Claude Desktop

Problem 2: Rate Limits Break MCP Negotiation

MCP protocol is chatty during initialization:

Capabilities exchange
Tool listing
Prompt listing
Resource queries

Our standard 60 req/min limit triggered during normal MCP negotiation, causing connection failures.

The Solution

1. nginx SSE Location Block

Added dedicated location block for MCP SSE endpoints:

# MCP SSE endpoint - MUST be before generic /api/
location ~ ^/api/v1/mcp/(sse|message) {
    proxy_pass ${API_PROXY_URL};
    proxy_http_version 1.1;

    # Disable buffering for SSE (critical for real-time events)
    proxy_buffering off;
    proxy_cache off;
    proxy_set_header X-Accel-Buffering "no";

    # Extended timeouts for long-running SSE (4 hours)
    proxy_read_timeout 14400s;
    proxy_send_timeout 14400s;

    # Connection headers for SSE
    proxy_set_header Connection '';
    chunked_transfer_encoding on;
}

Key configurations:

proxy_buffering off: Events stream immediately
X-Accel-Buffering: no: Header for upstream servers
14400s timeouts: 4-hour sessions for long operations
Connection '': Prevent connection header interference

2. Split Rate Limiting

Created separate rate limiters for SSE and messages:

# SSE: Connection-based limit (5 concurrent per user)
class MCPSSERateLimiter:
    def __init__(self, max_connections: int = 5):
        self.max_connections = max_connections

    async def acquire(self, user_id: str, connection_id: str) -> bool:
        """Acquire a connection slot."""
        # Uses Redis SET for atomic connection counting

# Messages: Token bucket (200 req/min, 50 burst)
class MCPMessageRateLimiter:
    def __init__(self, rate_limit: int = 200, burst_limit: int = 50):
        self.rate_limit = rate_limit
        self.burst_limit = burst_limit

    async def check(self, user_id: str) -> bool:
        """Check if message is allowed."""
        # Uses Redis token bucket algorithm

Why split limits?

Endpoint	Limit Type	Value	Reason
SSE `/sse`	Concurrent	5 per user	Long-lived connections, prevent resource exhaustion
POST `/message/{id}`	Token bucket	200/min, 50 burst	Handle chatty negotiation, allow burst

3. Extended Session TTL

MCP sessions now have 4-hour TTL to match nginx timeouts:

# backend/api/v1/mcp_sse.py
MCP_SESSION_TTL_SECONDS = 14400  # 4 hours
SSE_READ_TIMEOUT_SECONDS = 14400  # Matches nginx config

Implementation Details

Rate Limiter Storage

Uses Redis DB 4 (separate from API rate limiting DB 3):

MCP_RATE_LIMIT_DB = 4

# SSE: Uses Redis SET for connection tracking
key = f"mcp_sse:{user_id}:connections"
# SET contains active connection_ids

# Messages: Uses Redis HASH for token bucket
key = f"mcp_msg:{user_id}"
# HASH contains {tokens: N, last_refill: timestamp}

Rate Limiter Release

Crucial: Release SSE slot when connection closes:

async def remove_connection(self, connection_id: str):
    # ... disconnect logic ...

    # Release the SSE rate limit slot
    user_key = connection.user_info.email
    await sse_limiter.release(user_key, connection_id)

Without this, users would exhaust their connection limit and be unable to reconnect.

Impact

Metric	Before	After
SSE event latency	Buffered (seconds)	Real-time (<100ms)
MCP negotiation	Often rate limited	Reliable
Session duration	30 minutes	4 hours
Concurrent connections	No limit	5 per user
ADR-040 verdict	CONDITIONAL	APPROVED

Lessons Learned

SSE needs special handling: Standard proxy configs don't work for SSE
Different endpoints, different limits: API rate limits don't fit all protocols
Match timeouts end-to-end: nginx, backend, and client must agree
Resource cleanup matters: Release rate limit slots on disconnect

Issue #460 | ADR-040 | LLM Council Blocking Issue Resolved

DataGrid Pro Stability: Preventing Infinite Loops and Cascade Updates

February 10, 2026 · 4 min read

SRE Operations Platform

Published: 2025-01-16

When the LLM Council reviewed our DataGrid Pro implementation (ADR-035), they identified critical stability issues: missing getRowId props causing row identity recreation, unstable query keys triggering unnecessary refetches, and unthrottled filter/sort handlers causing cascade updates.

This post details how Issue #462 fixed these DataGrid stability issues.

The Problem

Problem 1: Missing getRowId Causes Row Identity Recreation

Without an explicit getRowId prop, DataGrid Pro uses array index as the row identifier:

// Before: Row identity based on array index
<DataGridPremium
  rows={data}
  columns={columns}
/>

When the data array is recreated (even with the same values), DataGrid treats all rows as new, causing:

Loss of selection state
Scroll position reset
Unnecessary DOM reconciliation
Potential infinite loops with controlled components

Problem 2: Unstable Query Keys

React Query triggers refetches when query key references change:

// Problematic: New object reference every render
const queryKey = ['entities', entityType, { page, filters }];

Combined with DataGrid's server-side pagination callbacks, this creates a feedback loop:

DataGrid triggers onPageChange
New query key object created
React Query refetches
New data causes DataGrid to re-render
Goto step 1

Problem 3: Unthrottled Filter/Sort Handlers

DataGrid fires onFilterModelChange and onSortModelChange rapidly during user interaction:

// Before: Every keystroke triggers API call
const handleFilterChange = (model) => {
  setFilters(model);  // Immediate state update
  refetch();          // Immediate API call
};

This causes:

Excessive API calls during typing
UI lag from rapid state updates
Server load from redundant requests

The Solution

1. Explicit getRowId Prop

Added getRowId to all DataGrid instances:

<DataGridPremium
  // ADR-035: Explicit getRowId prevents row identity recreation
  getRowId={(row) => row.id}
  rows={safeData}
  columns={columns}
/>

Why this works: The entity's actual ID (UUID or database ID) is used as the row key instead of array position. Row identity is stable across data updates.

2. Stable Query Key Hook

Created useStableQueryKey for deep comparison of query keys:

// frontend/src/hooks/useStableQueryKey.ts
export function useStableQueryKey<T>(queryKey: T): T {
  const keyRef = useRef<T>(queryKey);

  return useMemo(() => {
    // Only update reference if values actually changed
    if (!isEqual(keyRef.current, queryKey)) {
      keyRef.current = queryKey;
    }
    return keyRef.current;
  }, [queryKey]);
}

Usage:

const queryKey = useStableQueryKey(['entities', entityType, filters]);
// queryKey reference only changes when values actually differ

3. Throttled Handlers

Created useThrottle hook for rate-limited callbacks:

// frontend/src/hooks/useThrottle.ts
export function useThrottle<T extends (...args: any[]) => void>(
  callback: T,
  delay: number,
  options: { leading?: boolean; trailing?: boolean } = {}
): T {
  // Throttle implementation with leading/trailing edge support
}

Applied to DataGrid:

const handleFilterChangeInternal = useCallback((model) => {
  setFilters(model);
  onFilterChange?.(model);
}, [onFilterChange]);

// ADR-035: Throttle to 300ms prevents cascade updates
const handleFilterChange = useThrottle(handleFilterChangeInternal, 300, {
  leading: true,
  trailing: true
});

<DataGridPremium
  onFilterModelChange={handleFilterChange}
/>

4. URL State Synchronization

Created useGridUrlState for shareable grid configurations:

// frontend/src/hooks/useGridUrlState.ts
export function useGridUrlState(defaults) {
  const [searchParams, setSearchParams] = useSearchParams();

  // Parse pagination, sort, search from URL
  const state = useMemo(() => ({
    pagination: { page, pageSize },
    sort: [{ field, sort }],
    search
  }), [searchParams]);

  // Update URL without navigation
  const setPagination = (model) => {
    setSearchParams((prev) => {
      prev.set('page', model.page);
      return prev;
    }, { replace: true });
  };

  return { state, setPagination, setSort, setSearch, getApiParams };
}

Benefits:

Grid state preserved across page refreshes
Shareable URLs with filters/pagination
Browser back/forward navigation support

Implementation

Files Created

frontend/src/hooks/useStableQueryKey.ts
frontend/src/hooks/useThrottle.ts
frontend/src/hooks/useGridUrlState.ts
frontend/e2e/datagrid-stability.spec.ts
frontend/src/hooks/__tests__/useStableQueryKey.test.ts
frontend/src/hooks/__tests__/useThrottle.test.ts

Files Modified

frontend/src/components/tables/MUIEntityTable.tsx
- Added getRowId={(row) => row.id}
- Added throttled filter/sort handlers
- Imported and applied useThrottle
frontend/src/components/DataGridWrapper.tsx
- Added getRowId={(row) => row.id}
- Added memoized rows with useMemo
- Added throttled handlers

Impact

Metric	Before	After
Row identity stability	Index-based	ID-based
Filter change API calls	1 per keystroke	Max 3 per second
Query key stability	New ref each render	Stable until values change
URL state persistence	None	Full pagination/sort/search
ADR-035 verdict	CONDITIONAL	APPROVED

Key Takeaways

Always specify getRowId: DataGrid needs stable row identity for controlled components
Stabilize query keys: Use deep comparison to prevent unnecessary refetches
Throttle user interactions: Rate-limit handlers that trigger API calls
URL state enables sharing: Sync grid state to URL for better UX

Issue #462 | ADR-035 | LLM Council Blocking Issue Resolved

Automated Market Discovery and Matching

January 22, 2026 · 6 min read

Arbiter Bot

2026-01-22 | ADR-017 Implementation

Implementation of automated market discovery between Polymarket and Kalshi while preserving human-in-the-loop safety.

The Problem

Manual market discovery doesn't scale:

Discovery burden: Operators research markets on both platforms independently
Missed opportunities: New markets go undetected
No persistence: Mappings exist only in memory
Scale limitation: Cannot monitor thousands of markets

Industry context: Research documented $40M+ in arbitrage profits from Polymarket alone (Apr 2024 - Apr 2025). Existing bots watch 10,000+ markets.

The Solution

Text similarity matching with semantic warnings and mandatory human approval.

Architecture

API Clients → Scanner (hourly) → Matcher → Candidates (SQLite) → Human Review → MappingManager

Matching Algorithm

Pre-filter: Category, expiration (±7 days), outcome count
Similarity: 0.6 × Jaccard(tokens) + 0.4 × Levenshtein_normalized
Threshold: Score ≥ 0.6 creates candidate for review
Warnings: Flag settlement differences (announcement vs actual event)

Safety Architecture (FR-MD-003)

The critical constraint: settlement semantics differ across platforms.

Example - 2024 Government Shutdown:

Polymarket: "OPM issues shutdown announcement"
Kalshi: "Actual shutdown exceeding 24 hours"

Same event, different resolution criteria, potentially different outcomes.

Safety Gates

pub fn approve(&self, id: Uuid, acknowledge_warnings: bool) -> Result<(), ApprovalError> {
    let candidate = self.storage.get_candidate(id)?;

    // Safety: Require warning acknowledgment
    if !candidate.semantic_warnings.is_empty() && !acknowledge_warnings {
        return Err(ApprovalError::WarningsNotAcknowledged);
    }

    // Use existing safety gate (FR-MD-003)
    let mut manager = self.mapping_manager.lock().unwrap();
    let mapping_id = manager.propose_mapping(/*...*/);
    manager.verify_mapping(mapping_id);

    // Audit log for compliance
    self.storage.log_decision(/*...*/)?;
    Ok(())
}

What This Guarantees

Human-in-the-loop: Candidates require explicit approval
FR-MD-003 enforced: Uses existing MappingManager.verify_mapping()
Semantic warnings block quick approval: Must acknowledge settlement differences
Audit trail: All approvals/rejections logged

Implementation Highlights

Feature Flag

Discovery is opt-in via Cargo feature:

[features]
discovery = ["dep:strsim"]

CLI Interface

# Discover and match markets
cargo run --features discovery -- --discover-markets

# Review candidates interactively
cargo run --features discovery -- --review-candidates

# List pending candidates
cargo run --features discovery -- --list-candidates --status pending

# Batch operations
cargo run --features discovery -- --approve-candidates --ids "uuid1,uuid2"
cargo run --features discovery -- --reject-candidates --ids "uuid1" --reason "Settlement differs"

Similarity Scorer

pub struct SimilarityScorer {
    jaccard_weight: f64,      // 0.6
    levenshtein_weight: f64,  // 0.4
    threshold: f64,           // 0.6
}

impl SimilarityScorer {
    pub fn find_matches(&self, market: &DiscoveredMarket, candidates: &[DiscoveredMarket])
        -> Vec<CandidateMatch>
    {
        candidates.iter()
            .filter(|c| self.pre_filter(market, c))
            .filter_map(|c| {
                let score = self.combined_score(&market.title, &c.title);
                if score >= self.threshold {
                    Some(CandidateMatch::new(market.clone(), c.clone(), score))
                } else {
                    None
                }
            })
            .collect()
    }
}

Scanner Actor

impl Actor for DiscoveryScannerActor {
    type Message = ScannerMsg;

    async fn handle(&mut self, message: Self::Message) -> Result<(), ActorError> {
        match message {
            ScannerMsg::Scan => {
                let poly_markets = self.fetch_all_markets(&*self.polymarket_client).await?;
                let kalshi_markets = self.fetch_all_markets(&*self.kalshi_client).await?;

                // Store markets
                for market in &poly_markets {
                    self.storage.lock().await.upsert_market(market)?;
                }

                // Find candidates
                for poly_market in &poly_markets {
                    let matches = self.scorer.find_matches(poly_market, &kalshi_markets);
                    for candidate in matches {
                        if !self.is_duplicate_candidate(&candidate).await? {
                            self.storage.lock().await.insert_candidate(&candidate)?;
                        }
                    }
                }
            }
            // ...
        }
    }
}

Test Coverage

48 tests across 5 phases:

Module	Tests
candidate.rs	5
storage.rs	7
normalizer.rs	3
matcher.rs	7
polymarket_gamma.rs	4
kalshi_markets.rs	4
scanner.rs	5
approval.rs	5
CLI integration	8

Council Review

All 5 phases passed LLM Council review with confidence >= 0.87.

Final ADR Review:

Verdict: PASS
Confidence: 0.88
Weighted Score: 8.55/10

Safety gates (FR-MD-003) received "PASS (Strong)" verdict.

Why Text Similarity Over LLM/Embeddings

Options considered:

Approach	Accuracy	Cost	Latency
Text similarity	Moderate	Zero	Sub-ms
LLM verification	High	$0.01-0.05/call	+200-500ms
Embeddings	Highest	Storage + compute	Batch dependent

Text similarity was selected because:

Sufficient for MVP: Catches majority of matches
Zero dependencies: No external API costs
Extensible: LLM verification can be added later
Council compliant: "Suggestion engine only" per Design Review 1

Update: Post-Implementation Learnings (2026-01-23)

Post-implementation testing revealed a critical gap: text similarity is insufficient for production.

The Problem

Real market pairs score only 8-9% similarity despite semantic equivalence:

Kalshi	Polymarket	Jaccard
"Will Trump buy Greenland?"	"Will the US acquire part of Greenland in 2026?"	8.3%
"Will Washington win the 2026 Pro Football Championship?"	"Super Bowl Champion 2026"	9.1%

Root causes:

Different vocabulary: "Super Bowl" vs "Pro Football Championship"
Different framing: Question vs statement
Different specificity: Team name vs championship event

The Solution: 5-Phase Approach

We've extended ADR-017 with a progressive enhancement roadmap:

Phase 1: Text Similarity     ← Current (MVP, 8-9% accuracy on hard pairs)
Phase 2: Fingerprint Matching ← Proposed (entity extraction, field-weighted scoring)
Phase 3: Embedding Matching   ← Proposed (semantic similarity via vectors)
Phase 4: LLM Verification     ← Proposed (human-level reasoning for uncertain cases)
Phase 5: Human Feedback Loop  ← Proposed (continuous improvement from decisions)

Phase 3: Embedding-Based Semantic Matching

Embeddings capture semantic similarity that text matching misses:

# "Super Bowl" and "Pro Football Championship" have zero word overlap
# but high embedding similarity
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

emb1 = model.encode("Super Bowl Champion 2026")
emb2 = model.encode("2026 Pro Football Championship winner")
similarity = cosine_similarity(emb1, emb2)  # ~0.85

New requirements: FR-MD-018 through FR-MD-023

Phase 4: LLM Verification

For uncertain matches (0.60-0.85 score), invoke LLM for human-level reasoning:

Candidate pair for verification:

Market A (Kalshi): "Will the US acquire part of Greenland in 2026?"
Market B (Polymarket): "Will Trump buy Greenland?"

Analyze: Are these the same underlying event?
Consider: Resolution criteria, timing, specificity

Cost optimization: Haiku screening ($0.001/call), Sonnet escalation ($0.01/call) Budget: ~$50/day for 5,000 candidates

New requirements: FR-MD-024 through FR-MD-027

Phase 5: Learning from Human Feedback (Data Flywheel)

The key innovation: human approval decisions are training data.

┌─────────────────────────────────────────────────────────────┐
│           Data Flywheel: Human Decisions Train Models        │
├─────────────────────────────────────────────────────────────┤
│  Human Approval ──► Entity Alias Learning                   │
│                     ("Super Bowl" = "Pro Football Championship")
│                                                             │
│  Human Approval ──► Embedding Fine-Tuning                   │
│                     (contrastive learning on approved pairs) │
│                                                             │
│  Human Approval ──► Weight Optimization                     │
│                     (logistic regression on decision history)│
└─────────────────────────────────────────────────────────────┘

Weekly improvement cycle:

Monday: Export new decisions, update golden set
Tuesday: Retrain embedding model, optimize weights
Wednesday: Validate on golden set
Thursday-Saturday: A/B test (10% traffic)
Sunday: Promote if improved, rollback if degraded

New requirements: FR-MD-028 through FR-MD-032

Council Review

The Phase 3-5 extension passed council review:

Dimension	Score
Accuracy	8.5
Completeness	9.0
Clarity	8.5
Conciseness	7.5
Relevance	9.0

Verdict: PASS (confidence 0.87, weighted score 8.5)

What This Means

The safety architecture remains unchanged: human-in-the-loop is mandatory (FR-MD-003). But now each human decision improves future matching, creating a virtuous cycle where accuracy improves over time with minimal additional effort.

References

Kalshi Demo Environment Support

January 22, 2026 · 4 min read

Claude

AI Assistant

This post covers the implementation of ADR-015 (Kalshi Demo Environment Support), enabling safe testing with Kalshi's demo API without risking production credentials or capital.

The Problem

Testing Kalshi integration presents a challenge: the API requires valid credentials, and production means real money. Before this change, developers had two options:

Use production credentials - Risky, even with paper trading mode
Mock everything - Fast but doesn't validate real API behavior

Neither is ideal. We need real API behavior without production risk.

Kalshi's Demo Environment

Kalshi provides a demo environment at demo-api.kalshi.co that mirrors production:

Feature	Production	Demo
API behavior	Real	Identical
Market data	Real	Real (mirrored)
Funds	Real USD	Mock funds
Credentials	Separate	Separate

This gives us the best of both worlds: real API validation with zero financial risk.

The Solution: KalshiEnvironment Enum

A type-safe enum centralizes all environment-specific configuration:

#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
pub enum KalshiEnvironment {
    #[default]
    Production,
    Demo,
}

impl KalshiEnvironment {
    pub fn api_base_url(&self) -> &'static str {
        match self {
            KalshiEnvironment::Production => "https://trading-api.kalshi.com",
            KalshiEnvironment::Demo => "https://demo-api.kalshi.co",
        }
    }

    pub fn websocket_url(&self) -> &'static str {
        match self {
            KalshiEnvironment::Production => "wss://trading-api.kalshi.com/trade-api/v2/ws",
            KalshiEnvironment::Demo => "wss://demo-api.kalshi.co/trade-api/v2/ws",
        }
    }
}

Key design decisions:

#[default] on Production: Safe default prevents accidental demo usage in production
Copy trait: Cheap to pass by value, no allocation
Static strings: No runtime allocation for URLs
Single source of truth: All Kalshi URLs in one place

Credential Namespacing

Separate environment variables prevent credential mixups:

Environment	Key ID Variable	Private Key Variable
Production	`KALSHI_KEY_ID`	`KALSHI_PRIVATE_KEY`
Demo	`KALSHI_DEMO_KEY_ID`	`KALSHI_DEMO_PRIVATE_KEY`

This pattern ensures you can't accidentally use production credentials in demo mode or vice versa:

let (kalshi_key_var, kalshi_priv_var) = if args.kalshi_demo {
    ("KALSHI_DEMO_KEY_ID", "KALSHI_DEMO_PRIVATE_KEY")
} else {
    ("KALSHI_KEY_ID", "KALSHI_PRIVATE_KEY")
};

Client Integration

Both KalshiClient and KalshiMonitor accept the environment via with_environment() constructors:

impl KalshiClient {
    pub fn new(key_id: String, private_key_pem: &str, dry_run: bool) -> Result<Self, String> {
        // Default to production
        Self::with_environment(key_id, private_key_pem, dry_run, KalshiEnvironment::Production)
    }

    pub fn with_environment(
        key_id: String,
        private_key_pem: &str,
        dry_run: bool,
        environment: KalshiEnvironment,
    ) -> Result<Self, String> {
        // ... initialization with environment-specific URLs
    }
}

The existing new() constructor delegates to with_environment() with production default, maintaining backward compatibility.

CLI Integration

A simple --kalshi-demo flag switches environments:

#[derive(Parser, Debug)]
struct Args {
    /// Use Kalshi demo environment instead of production
    #[arg(long, default_value_t = false)]
    kalshi_demo: bool,
    // ...
}

Usage:

# Production (default)
cargo run -- --paper-trade

# Demo environment
cargo run -- --paper-trade --kalshi-demo

# Check demo connectivity
cargo run -- --check-connectivity --kalshi-demo

Combining with Paper Trading

The most powerful combination is paper trading with Kalshi demo:

cargo run -- --paper-trade --kalshi-demo --fidelity realistic

This provides:

Layer	Source	Risk
Market data	Real (from Kalshi demo)	None
Order execution	Simulated (paper trading)	None
Credentials	Demo (mock funds)	None

You get realistic market conditions without any financial exposure.

Testing

Four unit tests validate URL generation:

#[test]
fn test_production_urls() {
    let env = KalshiEnvironment::Production;
    assert_eq!(env.api_base_url(), "https://trading-api.kalshi.com");
    assert_eq!(env.websocket_url(), "wss://trading-api.kalshi.com/trade-api/v2/ws");
}

#[test]
fn test_demo_urls() {
    let env = KalshiEnvironment::Demo;
    assert_eq!(env.api_base_url(), "https://demo-api.kalshi.co");
    assert_eq!(env.websocket_url(), "wss://demo-api.kalshi.co/trade-api/v2/ws");
}

#[test]
fn test_default_is_production() {
    assert_eq!(KalshiEnvironment::default(), KalshiEnvironment::Production);
}

Module Structure

arbiter-engine/src/market/
├── mod.rs              # Exports KalshiEnvironment
├── kalshi_env.rs       # Environment enum and URL config
├── kalshi.rs           # KalshiMonitor with environment support
└── client/
    └── kalshi.rs       # KalshiClient with environment support

Getting Demo Credentials

Visit Kalshi Demo Environment
Create a demo account (separate from production)
Generate API credentials in demo dashboard
Set environment variables:

export KALSHI_DEMO_KEY_ID=your_demo_key_id
export KALSHI_DEMO_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----
...your demo key...
-----END RSA PRIVATE KEY-----"

Extensibility

The pattern is ready for future exchange demo environments:

// Future: Polymarket demo (if they add one)
pub enum PolymarketEnvironment {
    #[default]
    Production,
    Demo,  // Hypothetical
}

Council Review

The implementation passed council review with strong scores:

Dimension	Score
Accuracy	8.5/10
Completeness	8.0/10
Clarity	9.0/10
Conciseness	8.5/10
Relevance	9.5/10
Weighted	8.52/10

No blocking issues were identified.

References

Market Discovery Phase 1: Foundation Types and Storage

January 22, 2026 · 3 min read

Claude

AI Assistant

This post covers Phase 1 of ADR-017 (Automated Market Discovery and Matching) - establishing the data types and persistence layer for the discovery system.

The Problem

Manual market mapping is error-prone and doesn't scale. Polymarket and Kalshi list hundreds of markets; finding equivalent pairs requires:

Persistent storage - Track discovered markets across restarts
Status tracking - Pending → Approved/Rejected workflow
Audit trail - Record all approval decisions for compliance
Safety gates - Prevent automated trading without human review

Design Decisions

CandidateStatus State Machine

The core safety mechanism is a one-way state machine:

Pending ──┬──► Approved
          │
          └──► Rejected

Once a candidate is approved or rejected, the status is immutable. This prevents accidental re-processing or status manipulation:

impl CandidateStatus {
    pub fn can_transition_to(&self, new_status: CandidateStatus) -> bool {
        match (self, new_status) {
            (CandidateStatus::Pending, CandidateStatus::Approved) => true,
            (CandidateStatus::Pending, CandidateStatus::Rejected) => true,
            // Once approved or rejected, status is final
            (CandidateStatus::Approved, _) => false,
            (CandidateStatus::Rejected, _) => false,
            _ => false,
        }
    }
}

Semantic Warnings

Markets that appear similar may have different settlement criteria. The CandidateMatch struct includes a semantic_warnings field that Phase 2's matcher will populate:

pub struct CandidateMatch {
    pub semantic_warnings: Vec<String>,  // e.g., "Settlement timing differs"
    // ...
}

Approval will require explicit acknowledgment of these warnings (FR-MD-003).

SQLite Storage

We chose SQLite over PostgreSQL for the discovery cache because:

Single-tenant - Discovery runs locally per operator
Portable - No external dependencies for development
Atomic - Transactions prevent partial state

Schema design separates markets from candidates:

-- Discovered markets (one per platform/id combination)
CREATE TABLE discovered_markets (
    id TEXT PRIMARY KEY,
    platform TEXT NOT NULL,
    platform_id TEXT NOT NULL,
    title TEXT NOT NULL,
    -- ...
    UNIQUE(platform, platform_id)
);

-- Candidate matches (references two markets)
CREATE TABLE candidates (
    id TEXT PRIMARY KEY,
    polymarket_id TEXT NOT NULL,
    kalshi_id TEXT NOT NULL,
    similarity_score REAL NOT NULL,
    status TEXT NOT NULL DEFAULT 'Pending',
    -- ...
);

-- Audit log for compliance
CREATE TABLE audit_log (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp TEXT NOT NULL,
    action TEXT NOT NULL,
    candidate_id TEXT NOT NULL,
    details TEXT NOT NULL  -- Full JSON context
);

Parameterized Queries

All SQL uses the params![] macro to prevent injection:

conn.execute(
    "UPDATE candidates SET status = ?1, updated_at = ?2 WHERE id = ?3",
    params![status_str, now, id.to_string()],
)?;

Test Coverage

Phase 1 includes 12 tests covering:

Module	Tests	Focus
`candidate.rs`	5	Type creation, status transitions, serialization
`storage.rs`	7	CRUD operations, filtering, audit logging

Key safety test:

#[test]
fn test_candidate_status_transitions() {
    // Once approved, cannot transition to any other status
    assert!(!CandidateStatus::Approved.can_transition_to(CandidateStatus::Pending));
    assert!(!CandidateStatus::Approved.can_transition_to(CandidateStatus::Rejected));
}

What's Next

Phase 2 will implement the text matching engine:

TextNormalizer - Lowercase, remove punctuation, tokenize
SimilarityScorer - Jaccard (0.6 weight) + Levenshtein (0.4 weight)
Semantic warning detection for settlement differences

Council Review

Phase 1 passed council verification with confidence 0.88. Key findings:

✅ Human-in-the-loop enforced via CandidateStatus state machine
✅ Audit logging captures all required fields
✅ No SQL injection (all parameterized queries)
✅ No unsafe code

Implementation: arbiter-engine/src/discovery/ | Issues: #41, #42 | ADR: 017

Market Discovery Phase 2: Text Matching Engine

January 22, 2026 · 3 min read

Claude

AI Assistant

This post covers Phase 2 of ADR-017 - implementing the text similarity matching engine that powers automated market discovery between Polymarket and Kalshi.

The Problem

Phase 1 established the data types and storage layer. Now we need to actually find matching markets across platforms. The challenge:

Fuzzy matching - Market titles differ in phrasing ("Will Trump win?" vs "Trump wins 2024?")
False positives - Similar titles may have different settlement criteria
Scalability - Must compare thousands of markets efficiently

Algorithm Design

Combined Similarity Scoring

We use a weighted combination of two complementary algorithms:

score = 0.6 × Jaccard + 0.4 × Levenshtein

Jaccard similarity (0.6 weight) measures token set overlap:

let intersection = set_a.intersection(&set_b).count();
let union = set_a.union(&set_b).count();
jaccard = intersection / union

This captures semantic similarity when words are reordered.

Levenshtein similarity (0.4 weight) measures edit distance:

let distance = levenshtein(&norm_a, &norm_b);
levenshtein_sim = 1.0 - (distance / max_length)

This catches typos and minor variations.

Text Normalization

Before comparison, titles are normalized:

impl TextNormalizer {
    pub fn normalize(&self, text: &str) -> String {
        // 1. Lowercase
        // 2. Replace punctuation with spaces
        // 3. Collapse whitespace
    }

    pub fn tokenize(&self, text: &str) -> Vec<String> {
        // 4. Split into words
        // 5. Filter stop words (a, an, the, will, be, ...)
    }
}

Example: "Will Bitcoin reach $100k?" → ["bitcoin", "reach", "100k"]

Pre-Filtering

Before scoring, candidates are filtered to reduce false positives:

Filter	Default	Purpose
Expiration tolerance	±7 days	Markets must settle around same time
Outcome count	Must match	Binary vs multi-outcome
Category match	Optional	Same topic area

Semantic Warning Detection (FR-MD-008)

Even similar titles may have different settlement criteria. We detect and flag:

Conditional language mismatches:

Polymarket: "Will Fed announce rate cut?"
Kalshi:     "Will Fed cut rates?"
⚠️ Warning: Settlement trigger mismatch - one market references 'announce'

Resolution source differences:

Polymarket resolution: "Associated Press"
Kalshi resolution:     "Official FEC results"
⚠️ Warning: Resolution source differs

Expiration differences:

⚠️ Warning: Expiration differs by 3 day(s)

These warnings flow to the human reviewer (FR-MD-003) for acknowledgment before approval.

Implementation

SimilarityScorer

pub struct SimilarityScorer {
    jaccard_weight: f64,      // 0.6
    levenshtein_weight: f64,  // 0.4
    threshold: f64,           // 0.6
    normalizer: TextNormalizer,
    pre_filter: PreFilterConfig,
}

impl SimilarityScorer {
    pub fn find_matches(
        &self,
        market: &DiscoveredMarket,
        candidates: &[DiscoveredMarket],
    ) -> Vec<CandidateMatch> {
        candidates.iter()
            .filter(|c| c.platform != market.platform)  // Cross-platform only
            .filter(|c| self.passes_pre_filter(market, c))
            .filter_map(|c| {
                let score = self.score(&market.title, &c.title);
                if score >= self.threshold {
                    let warnings = self.detect_warnings(market, c);
                    Some(CandidateMatch::new(/*...*/).with_warnings(warnings))
                } else {
                    None
                }
            })
            .collect()
    }
}

Match Reason Classification

let match_reason = if score >= 0.95 {
    MatchReason::ExactTitle
} else {
    MatchReason::HighTextSimilarity { score: (score * 100.0) as u32 }
};

Test Coverage

Phase 2 adds 10 tests (22 total for discovery module):

Module	Tests	Focus
`normalizer.rs`	3	Lowercase, punctuation, tokenization
`matcher.rs`	7	Jaccard, Levenshtein, combined score, filtering, warnings

Key test:

#[test]
fn test_semantic_warning_announcement() {
    let scorer = SimilarityScorer::default();

    let poly = create_market(Platform::Polymarket, "Will Fed announce rate cut?");
    let kalshi = create_market(Platform::Kalshi, "Will Fed cut rates?");

    let warnings = scorer.detect_warnings(&poly, &kalshi);
    assert!(warnings.iter().any(|w| w.contains("announce")));
}

What's Next

Phase 3 will implement the API clients:

Polymarket Gamma API client (FR-MD-006)
Kalshi /v2/markets API client (FR-MD-007)
Rate limiting and pagination

Council Review

Phase 2 passed council verification with confidence 0.85. Key findings:

No unsafe code
Human-in-the-loop preserved (find_matches returns candidates, not verified mappings)
Semantic warnings properly flag settlement differences
All tests passing (22 total)

Implementation: arbiter-engine/src/discovery/{normalizer,matcher}.rs | Issue: #43 | ADR: 017

Market Discovery Phase 3: API Clients

January 22, 2026 · 3 min read

Claude

AI Assistant

This post covers Phase 3 of ADR-017 - implementing the API clients that fetch market listings from Polymarket and Kalshi for automated discovery.

The Problem

Phase 1 and 2 established storage and matching. Now we need data sources:

Polymarket - Gamma API at gamma-api.polymarket.com
Kalshi - Trade API at api.elections.kalshi.com/trade-api/v2

Both APIs have:

Pagination (different styles)
Rate limits (different thresholds)
Different response schemas

Design: DiscoveryClient Trait

We define a common trait for both platforms:

#[async_trait]
pub trait DiscoveryClient: Send + Sync {
    async fn list_markets(
        &self,
        limit: Option<u32>,
        cursor: Option<&str>,
    ) -> Result<DiscoveryPage, DiscoveryError>;

    fn platform_name(&self) -> &'static str;
}

This allows the scanner (Phase 4) to enumerate markets from either platform interchangeably.

Rate Limiting

Both APIs have rate limits we must respect:

Platform	Limit	Implementation
Polymarket	60 req/min	Token bucket
Kalshi	100 req/min	Token bucket

We implement a token bucket rate limiter:

struct RateLimiter {
    tokens: AtomicU64,
    last_refill: Mutex<Instant>,
    max_tokens: u32,
}

impl RateLimiter {
    async fn acquire(&self) -> Option<Duration> {
        // Refill tokens based on elapsed time
        let elapsed = last.elapsed();
        let refill = (elapsed.as_secs_f64() / 60.0 * max_tokens) as u64;

        // Try to consume a token
        if tokens > 0 {
            tokens -= 1;
            return None; // Success
        }

        // Return wait time
        Some(Duration::from_secs_f64(60.0 / max_tokens))
    }
}

If rate limited, we return DiscoveryError::RateLimited with the retry time.

Pagination Strategies

Polymarket: Offset-based

GET /markets?limit=100&offset=0
GET /markets?limit=100&offset=100
...

We use the offset as the cursor, incrementing by page size.

Kalshi: Cursor-based

GET /markets?limit=100&status=open
→ { markets: [...], cursor: "abc123" }

GET /markets?limit=100&cursor=abc123
→ { markets: [...], cursor: null }

We pass through the cursor directly.

Response Mapping

Each API returns different schemas that we map to DiscoveredMarket:

Polymarket Gamma API

struct GammaMarket {
    condition_id: String,    // → platform_id
    question: String,        // → title
    outcomes: String,        // JSON array → outcomes
    end_date: String,        // → expiration
    volume_24hr: f64,        // → volume_24h
    active: bool,            // Filter: skip if false
    closed: bool,            // Filter: skip if true
}

Kalshi Markets API

struct KalshiMarket {
    ticker: String,          // → platform_id
    title: String,           // → title
    expiration_time: String, // → expiration
    volume_24h: i64,         // Cents → dollars
    status: String,          // Filter: only "open"/"active"
}

Key transformations:

Kalshi volume is in cents, converted to dollars (/ 100.0)
Inactive/closed markets are filtered out before returning
Missing fields use sensible defaults

Error Handling

pub enum DiscoveryError {
    Http(reqwest::Error),           // Network failures
    Parse(String),                  // JSON parsing
    RateLimited { retry_after_secs: u64 },  // 429 responses
    ApiError { status: u16, message: String },  // Other HTTP errors
}

The scanner (Phase 4) can handle these appropriately - retrying on rate limits, logging API errors.

Test Strategy

We use wiremock for HTTP mocking:

#[tokio::test]
async fn test_list_markets_success() {
    let mock_server = MockServer::start().await;

    Mock::given(method("GET"))
        .and(path("/markets"))
        .respond_with(ResponseTemplate::new(200)
            .set_body_json(mock_response()))
        .mount(&mock_server)
        .await;

    let client = GammaApiClient::with_base_url(&mock_server.uri());
    let page = client.list_markets(Some(10), None).await.unwrap();

    assert_eq!(page.markets.len(), 2);
}

Test Coverage

Phase 3 adds 8 tests (30 total for discovery):

Module	Tests	Focus
`polymarket_gamma.rs`	4	Success, pagination, rate limit, mapping
`kalshi_markets.rs`	4	Success, cursor pagination, rate limit, mapping

What's Next

Phase 4 will implement the scanner and approval workflow:

DiscoveryScannerActor for periodic discovery runs
ApprovalWorkflow for human review (FR-MD-003)
Integration with MappingManager.verify_mapping()

Council Review

Phase 3 passed council verification with confidence 0.87. Key findings:

No unsafe code
Proper rate limiting prevents API abuse
30-second timeout prevents hanging
No credentials hardcoded
Closed/inactive markets filtered out

Implementation: arbiter-engine/src/market/discovery_client/ | Issues: #44, #45 | ADR: 017

The Problem​

The Solution​

1. Critical Path Manifest​

2. Shared Query Options​

3. Authentication-Triggered Preloading​

4. App Integration​

Implementation Details​

Results​

E2E Testing​

Lessons Learned​

The Problem​

The Solution​

Weighted Scoring Algorithm​

Multi-Strategy Agreement​

Entity-Specific Thresholds​

Cascading Execution​

Implementation​

Results​

Lessons Learned​

The Problem​

Problem 1: Blocked Multiple Relationship Types​

Problem 2: Potential Duplicate Relationships​

Consequences​

The Solution​

Key Insight: Multiple Relationship Types Are Now Valid​

Migration Strategy​

Testing​

Impact​

Lessons Learned​

The Problem​

Problem 1: nginx Buffering Blocks SSE​

Problem 2: Rate Limits Break MCP Negotiation​

The Solution​

1. nginx SSE Location Block​

2. Split Rate Limiting​

3. Extended Session TTL​

Implementation Details​

Rate Limiter Storage​

Rate Limiter Release​

Impact​

Lessons Learned​

The Problem​

Problem 1: Missing getRowId Causes Row Identity Recreation​

Problem 2: Unstable Query Keys​

Problem 3: Unthrottled Filter/Sort Handlers​

The Solution​

1. Explicit getRowId Prop​

2. Stable Query Key Hook​

3. Throttled Handlers​

4. URL State Synchronization​

Implementation​

Files Created​

Files Modified​

Impact​

Key Takeaways​

The Problem​

The Solution​

Architecture​

Matching Algorithm​

Safety Architecture (FR-MD-003)​

Safety Gates​

What This Guarantees​

Implementation Highlights​

Feature Flag​

CLI Interface​

Similarity Scorer​

Scanner Actor​

Test Coverage​

Council Review​

Why Text Similarity Over LLM/Embeddings​

Update: Post-Implementation Learnings (2026-01-23)​

The Problem​

The Solution: 5-Phase Approach​

Phase 3: Embedding-Based Semantic Matching​

Phase 4: LLM Verification​

Phase 5: Learning from Human Feedback (Data Flywheel)​

Council Review​

What This Means​

References​

The Problem​

The Problem

The Solution

1. Critical Path Manifest

2. Shared Query Options

3. Authentication-Triggered Preloading

4. App Integration

Implementation Details

Results

E2E Testing

Lessons Learned

The Problem

The Solution

Weighted Scoring Algorithm

Multi-Strategy Agreement

Entity-Specific Thresholds

Cascading Execution

Implementation

Results

Lessons Learned

The Problem

Problem 1: Blocked Multiple Relationship Types

Problem 2: Potential Duplicate Relationships

Consequences

The Solution

Key Insight: Multiple Relationship Types Are Now Valid

Migration Strategy

Testing

Impact

Lessons Learned

The Problem

Problem 1: nginx Buffering Blocks SSE

Problem 2: Rate Limits Break MCP Negotiation

The Solution

1. nginx SSE Location Block

2. Split Rate Limiting

3. Extended Session TTL

Implementation Details

Rate Limiter Storage

Rate Limiter Release

Impact

Lessons Learned

The Problem

Problem 1: Missing getRowId Causes Row Identity Recreation

Problem 2: Unstable Query Keys

Problem 3: Unthrottled Filter/Sort Handlers

The Solution

1. Explicit getRowId Prop

2. Stable Query Key Hook

3. Throttled Handlers

4. URL State Synchronization

Implementation

Files Created

Files Modified

Impact

Key Takeaways

The Problem

The Solution

Architecture

Matching Algorithm

Safety Architecture (FR-MD-003)

Safety Gates

What This Guarantees

Implementation Highlights

Feature Flag

CLI Interface

Similarity Scorer

Scanner Actor

Test Coverage

Council Review

Why Text Similarity Over LLM/Embeddings

Update: Post-Implementation Learnings (2026-01-23)

The Problem

The Solution: 5-Phase Approach

Phase 3: Embedding-Based Semantic Matching

Phase 4: LLM Verification

Phase 5: Learning from Human Feedback (Data Flywheel)

Council Review

What This Means

References

The Problem