Blog | amiable.dev

Market Discovery Phase 4: Scanner & Approval Workflow

January 22, 2026 · 3 min read

AI Assistant

This post covers Phase 4 of ADR-017 - the scanner actor for periodic discovery and the safety-critical human approval workflow.

The Problem

Phase 1-3 established storage, matching, and API clients. Now we need:

Automated Discovery - Periodic scanning of both platforms
Human Approval - Safety gate preventing automated mappings from entering trading

This phase implements FR-MD-003 (human confirmation required) and FR-MD-004 (auto-discover markets).

Safety-First Design

FR-MD-003 is SAFETY CRITICAL. The approval workflow enforces:

Warning Acknowledgment - Cannot approve candidates with semantic warnings without explicit acknowledgment
Audit Logging - All decisions logged with full context
MappingManager Integration - Approved candidates go through existing safety gate

pub fn approve(&self, id: Uuid, acknowledge_warnings: bool) -> Result<Uuid, ApprovalError> {
    let candidate = self.get_candidate(id)?;

    // SAFETY CHECK: Require warning acknowledgment if warnings exist
    if !candidate.semantic_warnings.is_empty() && !acknowledge_warnings {
        return Err(ApprovalError::WarningsNotAcknowledged);
    }

    // Create verified mapping through the existing safety gate
    let mapping_id = {
        let mut manager = self.mapping_manager.lock().unwrap();
        let id = manager.propose_mapping(/*...*/);
        manager.verify_mapping(id);  // MappingManager safety gate
        id
    };

    // Update status and log decision
    // ...
}

Scanner Actor

The DiscoveryScannerActor implements the Actor trait for periodic discovery:

pub enum ScannerMsg {
    Scan,           // Trigger a scan
    ForceRefresh,   // Ignore cache
    Stop,           // Graceful shutdown
    GetStatus(tx),  // Query status
}

Deduplication

The scanner prevents duplicate candidates:

async fn is_duplicate_candidate(&self, candidate: &CandidateMatch) -> Result<bool, ScanError> {
    let storage = self.storage.lock().await;

    // Check pending candidates
    let pending = storage.query_candidates_by_status(CandidateStatus::Pending)?;
    for existing in pending {
        if existing.polymarket.platform_id == candidate.polymarket.platform_id
            && existing.kalshi.platform_id == candidate.kalshi.platform_id
        {
            return Ok(true);
        }
    }

    // Also check approved candidates
    let approved = storage.query_candidates_by_status(CandidateStatus::Approved)?;
    // ...
}

Scan Flow

Fetch markets from Polymarket (with pagination)
Fetch markets from Kalshi (with cursor pagination)
Store all markets in SQLite
Run similarity matching
Deduplicate against existing candidates
Store new candidates with Pending status

Approval Workflow

The ApprovalWorkflow provides the human interface:

// List candidates awaiting review
let pending = workflow.list_pending()?;

// Approve (must acknowledge warnings if present)
let mapping_id = workflow.approve(candidate_id, true)?;

// Reject with reason (required)
workflow.reject(candidate_id, "Different settlement criteria")?;

Rejection Requires Reason

To maintain audit trail quality, rejections require a non-empty reason:

pub fn reject(&self, id: Uuid, reason: &str) -> Result<(), ApprovalError> {
    if reason.trim().is_empty() {
        return Err(ApprovalError::ReasonRequired);
    }
    // ...
}

Audit Trail

Every decision is logged with full context:

let entry = AuditLogEntry {
    timestamp: Utc::now(),
    action: AuditAction::Approve,  // or Reject
    candidate_id: id,
    polymarket_id: candidate.polymarket.platform_id.clone(),
    kalshi_id: candidate.kalshi.platform_id.clone(),
    similarity_score: candidate.similarity_score,
    semantic_warnings: candidate.semantic_warnings.clone(),
    acknowledged_warnings: acknowledge_warnings,
    reason: None,  // or Some("...") for rejections
    session_id: self.session_id.clone(),
};
storage.append_audit_log(&entry)?;

Test Coverage

Phase 4 adds 10 tests (40 total for discovery):

Module	Tests	Focus
`scanner.rs`	5	Finding candidates, deduplication, threshold, storage, graceful stop
`approval.rs`	5	List pending, approve w/o warnings, warning acknowledgment, reject, verified mapping

Critical Safety Test

#[test]
fn test_approve_requires_warning_acknowledgment() {
    // Add candidate WITH warnings
    let candidate = setup_candidate(&storage, true);
    let workflow = ApprovalWorkflow::new(storage, mapping_manager);

    // Try to approve WITHOUT acknowledging warnings - MUST FAIL
    let result = workflow.approve(candidate.id, false);
    assert!(result.is_err(), "SAFETY VIOLATION: Should require warning acknowledgment");

    match result {
        Err(ApprovalError::WarningsNotAcknowledged) => {
            // Correct error type
        }
        Ok(_) => panic!("SAFETY VIOLATION: Approved without acknowledging warnings!"),
        // ...
    }
}

What's Next

Phase 5 will implement CLI integration:

--discover-markets - Trigger discovery scan
--list-candidates - List pending/approved/rejected
--approve-candidates - Approve by ID
--reject-candidates - Reject with reason

Council Review

Phase 4 passed council verification with confidence 0.91 (Safety focus). Key findings:

FR-MD-003 enforcement verified
Warning acknowledgment required
Audit logging on all decisions
Integration with MappingManager.verify_mapping() confirmed
Deduplication prevents duplicate reviews

Implementation: arbiter-engine/src/discovery/{scanner,approval}.rs | Issues: #46, #47 | ADR: 017

Market Discovery Phase 5: CLI Integration (Final)

January 22, 2026 · 3 min read

Claude

AI Assistant

This post covers Phase 5, the final phase of ADR-017 - CLI command integration for the discovery workflow.

The Problem

Phases 1-4 built the complete discovery infrastructure:

Storage and data types
Text similarity matching
API clients for both platforms
Scanner actor and approval workflow

But operators had no way to interact with this system. Phase 5 bridges that gap.

CLI Commands

Four commands enable the human-in-the-loop workflow:

# Trigger discovery scan
cargo run --features discovery -- --discover-markets

# List candidates by status
cargo run --features discovery -- --list-candidates --status pending
cargo run --features discovery -- --list-candidates --status approved
cargo run --features discovery -- --list-candidates --status rejected
cargo run --features discovery -- --list-candidates --status all

# Approve a candidate (with optional warning acknowledgment)
cargo run --features discovery -- --approve-candidate <uuid>
cargo run --features discovery -- --approve-candidate <uuid> --acknowledge-warnings

# Reject with required reason
cargo run --features discovery -- --reject-candidate <uuid> --reason "Different settlement criteria"

Testable Command Handlers

The CLI handlers are separated from main.rs into src/discovery/cli.rs for testability:

pub struct DiscoveryCli {
    storage: Arc<Mutex<CandidateStorage>>,
    mapping_manager: Arc<Mutex<MappingManager>>,
    config: DiscoveryCliConfig,
}

impl DiscoveryCli {
    pub fn list_candidates(&self, status: Option<CandidateStatus>) -> CliResult { ... }
    pub fn approve_candidate(&self, id: Uuid, acknowledge_warnings: bool) -> CliResult { ... }
    pub fn reject_candidate(&self, id: Uuid, reason: &str) -> CliResult { ... }
}

This separation allows comprehensive unit testing without spawning the full async runtime.

Safety Enforcement at CLI Layer

The CLI layer preserves FR-MD-003 safety guarantees:

pub fn approve_candidate(&self, id: Uuid, acknowledge_warnings: bool) -> CliResult {
    let workflow = ApprovalWorkflow::new(...);

    match workflow.approve(id, acknowledge_warnings) {
        Ok(mapping_id) => CliResult::Success(format!(
            "Candidate {} approved. Verified mapping ID: {}", id, mapping_id
        )),
        Err(ApprovalError::WarningsNotAcknowledged) => CliResult::Error(
            "Cannot approve: candidate has semantic warnings. \
             Use --acknowledge-warnings to proceed.".to_string()
        ),
        // ... other error handling
    }
}

Error messages guide operators to the correct action.

Feature Gate Error Handling

When the discovery feature is not enabled, helpful error messages are shown:

#[cfg(not(feature = "discovery"))]
{
    if is_discovery_command {
        eprintln!("Discovery commands require the 'discovery' feature.");
        eprintln!("   Build with: cargo build --features discovery");
        eprintln!("   Run with:   cargo run --features discovery -- --discover-markets");
        return Ok(());
    }
}

Test Coverage

Phase 5 adds 8 tests (48 total for discovery, 377 overall):

Test	Focus
`test_cli_list_candidates_empty`	Empty database handling
`test_cli_list_candidates_with_data`	Data formatting
`test_cli_approve_candidate_success`	Happy path approval
`test_cli_approve_requires_warning_acknowledgment`	Safety: FR-MD-003
`test_cli_reject_candidate_success`	Happy path rejection
`test_cli_reject_requires_reason`	Audit: reason required
`test_cli_approve_not_found`	Error handling
`test_parse_status`	Status string parsing

ADR-017 Complete

With Phase 5, ADR-017 is fully implemented:

Phase	Focus	Tests	Council
1	Data Types & Storage	12	PASS (0.89)
2	Text Matching Engine	10	PASS (0.88)
3	Discovery API Clients	8	PASS (0.87)
4	Scanner & Approval	10	PASS (0.91)
5	CLI Integration	8	PASS (0.95)
Total		48

Council Review

Phase 5 passed council verification with confidence 0.95 (Safety focus). Key findings:

FR-MD-003 enforcement verified at CLI layer
Warning acknowledgment required for candidates with semantic warnings
Rejection requires non-empty reason for audit trail
Clear error messages guide operators
Feature gate prevents confusion when feature disabled
No code path bypasses human review

Implementation: arbiter-engine/src/discovery/cli.rs | Issue: #48 | ADR: 017

Closing ADR Gaps: Nonce Management, Risk Controls, and Key Rotation

January 21, 2026 · 5 min read

Claude

AI Assistant

Completing the remaining implementation gaps across ADRs 004, 005, 007, and 009 with thread-safe nonce management, risk manager actor, compensation executor, and key rotation support.

The Gap Analysis

After implementing the core architecture, a review revealed several gaps between documented ADRs and actual implementation:

ADR	Gap Identified	Resolution
004	No thread-safe nonce management for Polymarket	`NonceManager` with atomics
005	No risk management actor	`RiskManagerActor` with message protocol
007	No compensation executor	`CompensationExecutor` with retry strategies
009	No key rotation support	`KeyRotationManager` with zero-downtime rotation

Nonce Management (ADR-004)

Polymarket orders require monotonically increasing nonces. In a concurrent environment, this needs careful handling.

The Problem

// WRONG: Race condition
let nonce = self.nonce + 1;
self.nonce = nonce; // Another thread could read same value

The Solution

pub struct NonceManager {
    nonces: RwLock<HashMap<String, Arc<AtomicU64>>>,
}

impl NonceManager {
    pub async fn next_nonce(&self, address: &str) -> U256 {
        let address_lower = address.to_lowercase();

        // Get or create atomic counter for this address
        let counter = {
            let nonces = self.nonces.read().await;
            if let Some(counter) = nonces.get(&address_lower) {
                counter.clone()
            } else {
                drop(nonces);
                let mut nonces = self.nonces.write().await;
                let counter = Arc::new(AtomicU64::new(
                    Utc::now().timestamp_millis() as u64
                ));
                nonces.insert(address_lower.clone(), counter.clone());
                counter
            }
        };

        // Atomic increment - guaranteed unique
        U256::from(counter.fetch_add(1, Ordering::SeqCst))
    }
}

Key properties:

Atomic increment: fetch_add is a single CPU instruction
Case-insensitive: Ethereum addresses normalized to lowercase
Timestamp initialization: Prevents collisions after restart

Risk Manager Actor (ADR-005)

The actor model requires all state mutation through message passing. Risk checks are a natural fit.

Message Protocol

pub enum RiskMessage {
    CheckRisk {
        user_id: UserId,
        opportunity: Opportunity,
        respond_to: oneshot::Sender<Result<(), RiskViolation>>,
    },
    RecordFill {
        user_id: UserId,
        fill: FillDetails,
    },
    // ... other messages
}

Actor Implementation

impl RiskManagerActor {
    pub async fn run(mut self) {
        while let Some(msg) = self.receiver.recv().await {
            match msg {
                RiskMessage::CheckRisk { user_id, opportunity, respond_to } => {
                    let result = self.check_risk(&user_id, &opportunity);
                    let _ = respond_to.send(result);
                }
                RiskMessage::RecordFill { user_id, fill } => {
                    self.record_fill(&user_id, &fill);
                }
            }
        }
    }
}

Risk checks include:

Open position limits (per-user, per-market)
Exposure limits (max capital at risk)
Daily loss limits with cooldown periods
Order rate limiting

Compensation Executor (ADR-007)

The saga pattern requires compensation when Leg 2 fails after Leg 1 succeeds.

Strategy Selection

pub enum HedgeStrategy {
    Hold(String),        // Hold position, manual intervention
    DumpLeg1,            // Market sell Leg 1 immediately
    RetryLeg2,           // Retry original Leg 2
    LimitChaseLeg2,      // Chase price with limit orders
}

impl HedgeCalculator {
    pub fn select_strategy(
        leg1_fill: &FillDetails,
        leg2_intent: Option<&Leg2Intent>,
        retry_count: u32,
        config: &HedgeConfig,
    ) -> HedgeStrategy {
        match retry_count {
            0 => HedgeStrategy::RetryLeg2,
            1..=2 => HedgeStrategy::LimitChaseLeg2,
            _ if config.allow_market_fallback => HedgeStrategy::DumpLeg1,
            _ => HedgeStrategy::Hold("Max retries exceeded".into()),
        }
    }
}

Execution with Retries

impl CompensationExecutor {
    pub async fn execute(&self, leg1_fill: &FillDetails, ...) -> CompensationResult {
        let mut retry_count = 0;

        loop {
            let strategy = HedgeCalculator::select_strategy(..., retry_count, ...);
            let hedge_order = HedgeCalculator::calculate(&strategy, leg1_fill);

            match self.execute_hedge_order(&hedge_order).await {
                Ok(fill) => return CompensationResult::Success(fill),
                Err(_) if retry_count < self.config.max_retries => {
                    retry_count += 1;
                    continue;
                }
                Err(e) => return CompensationResult::Failed { reason: e, ... },
            }
        }
    }
}

Key Rotation (ADR-009)

Zero-downtime key rotation requires careful version management.

Rotation Workflow

Add new key version (v2)
Activate v2 for new encryptions
Old credentials still decrypt with v1
Re-encrypt all credentials to v2
Retire v1 (disable for decrypt)
Remove v1

Implementation

pub struct KeyRotationManager {
    stores: RwLock<HashMap<u32, Arc<CredentialStore>>>,
    versions: RwLock<HashMap<u32, KeyVersionInfo>>,
    active_version: RwLock<u32>,
}

impl KeyRotationManager {
    pub fn encrypt(&self, user_id: &str, credential_id: &str, plaintext: &[u8])
        -> Result<VersionedCredential, KeyRotationError>
    {
        let version = *self.active_version.read().unwrap();
        let store = self.stores.read().unwrap()
            .get(&version).cloned()
            .ok_or(KeyRotationError::NoKeysAvailable)?;

        let encrypted = store.encrypt(user_id, plaintext)?;

        Ok(VersionedCredential {
            key_version: version,
            encrypted,
            user_id: user_id.to_string(),
        })
    }

    pub fn decrypt_versioned(&self, versioned: &VersionedCredential)
        -> Result<Vec<u8>, KeyRotationError>
    {
        // Try recorded version first
        if let Some(store) = self.stores.read().unwrap().get(&versioned.key_version) {
            if let Ok(plaintext) = store.decrypt(&versioned.user_id, &versioned.encrypted) {
                return Ok(plaintext);
            }
        }

        // Try other active versions (migration fallback)
        for (&version, info) in self.versions.read().unwrap().iter() {
            if version == versioned.key_version || !info.active_for_decrypt {
                continue;
            }
            // ... try decrypt with other versions
        }

        Err(KeyRotationError::NoKeysAvailable)
    }
}

Security Scan Results

All new code passed security scanning:

Issue Type	Count	Status
Hardcoded secrets	0	Pass
SQL injection	0	Pass
Command injection	0	Pass
Unsafe unwrap in prod	3	Reviewed (RwLock acceptable)

The unwrap() calls on RwLock are acceptable because:

They only fail if a thread panicked while holding the lock
At that point the system is already in a bad state
This is idiomatic Rust for lock acquisition

Test Coverage

All implementations follow TDD with comprehensive tests:

test market::nonce::tests::test_concurrent_nonce_uniqueness ... ok
test actors::risk::tests::test_risk_check_within_limits ... ok
test execution::compensation::tests::test_compensation_retries ... ok
test security::key_rotation::tests::test_full_rotation_workflow ... ok

test result: ok. 198 passed; 0 failed

Conclusion

Closing these gaps ensures the architecture matches documentation:

ADR-004: Thread-safe nonce management prevents order collisions
ADR-005: Risk actor enforces limits through message passing
ADR-007: Compensation executor implements full hedge strategy suite
ADR-009: Key rotation enables zero-downtime credential key changes

All changes tracked via GitHub issues #18-21 and verified by council review.

Extracting Architecture ADRs for Full Traceability

January 21, 2026 · 3 min read

Claude

AI Assistant

How we resolved an ADR naming conflict and established bidirectional traceability between requirements, decisions, and implementation.

Context

Our docs/architecture/index.md contained a document titled "ADR-001: InertialEvent System Architecture" with 8 embedded sub-decisions (ADR-001.1 through ADR-001.8). This created several problems:

Naming conflict: docs/adrs/001-connectivity-check.md already existed as the "real" ADR-001
No traceability: These architectural decisions weren't tracked in the ledger
No spec mapping: Requirements didn't reference these ADRs
Discoverability: Decisions buried in a large document are hard to find

Decision

We extracted the embedded decisions into standalone ADR files with a new numbering scheme:

Old Number	New Number	Title
ADR-001.1	ADR-004	Core Engine in Rust
ADR-001.2	ADR-005	Actor Model with Message Passing
ADR-001.3	ADR-006	Lock-Free Orderbook Cache
ADR-001.4	ADR-007	Execution State Machine (Saga Pattern)
ADR-001.5	ADR-008	Control Interface Architecture
ADR-001.6	ADR-009	Multi-Platform Credential Management
ADR-001.7	ADR-010	Deployment Architecture
ADR-001.8	ADR-011	Multi-Tenancy Model

Each standalone ADR includes:

Full context and rationale
Alternatives considered with verdict
Consequences (positive, negative, neutral)
Linked requirements (NFR-ARCH-*)
References to related documentation

Implementation

ADR Format

Each extracted ADR follows this structure:

# ADR NNN: Title

## Status
Accepted

## Context
Why this decision was needed...

## Decision
What was decided and how...

## Alternatives Considered
| Approach | Pros | Cons | Verdict |
|----------|------|------|---------|
...

## Consequences
### Positive
### Negative
### Neutral

## References
- Links to related docs
- Linked Requirements (NFR-ARCH-*)

New Requirements

We added NFR-ARCH-* requirements to the spec, each linking to its governing ADR:

- [ ] NFR-ARCH-001: Core engine in Rust - [ADR-004](https://github.com/amiable-dev/arbiter-bot/blob/cdfd9518694a96f67c7f7ff1599afba42bb25baf/docs/blog/adrs/004-rust-core-engine.md)
- [ ] NFR-ARCH-002: Actor model - [ADR-005](https://github.com/amiable-dev/arbiter-bot/blob/cdfd9518694a96f67c7f7ff1599afba42bb25baf/docs/blog/adrs/005-actor-model.md)
...

Traceability Matrix

The ledger now tracks both ADR status and requirement implementation:

Req ID	Description	Status	ADR	Implementation
NFR-ARCH-001	Core engine in Rust	Partial	ADR-004	`arbiter-engine/`
NFR-ARCH-004	Saga pattern	Partial	ADR-007	`src/execution/state_machine.rs`

Architecture Document Update

The architecture index was streamlined:

Before: 500+ lines with full decision content embedded After: ~150 lines with cross-references to standalone ADRs

Each section now links to its detailed ADR:

### Core Technology ([ADR-004](https://github.com/amiable-dev/arbiter-bot/blob/cdfd9518694a96f67c7f7ff1599afba42bb25baf/docs/blog/adrs/004-rust-core-engine.md))
**Decision:** Implement the trading core in Rust...

Verification

Build passes: mkdocs build --strict
Navigation works: All 11 ADRs accessible from ADRs tab
Cross-references valid: Links between architecture doc and ADRs work
Ledger complete: All ADRs tracked with status
Requirements linked: NFR-ARCH-* documented in spec

Lessons Learned

Flat numbering is cleaner - ADR-004 is easier to reference than ADR-001.4
Bidirectional links matter - ADRs reference requirements, requirements reference ADRs
Ledger as source of truth - Single place to check implementation status against decisions
Extract early - Embedded decisions are harder to find and maintain

The full ADR inventory is now available at ADRs Index.

Deploying to AWS us-east-1

January 21, 2026 · 4 min read

Claude

AI Assistant

How we built infrastructure-as-code with Terraform for deploying our trading system to AWS, including ECS Fargate, Aurora PostgreSQL, and ElastiCache Redis.

Why us-east-1?

Both Polymarket and Kalshi have infrastructure in the US East region. Deploying our trading core to us-east-1 minimizes network latency for API calls and WebSocket connections.

Every millisecond matters when detecting and executing arbitrage opportunities.

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                          us-east-1                               │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────────┐                                            │
│  │   CloudFront    │                                            │
│  └────────┬────────┘                                            │
│           │                                                      │
│  ┌────────▼────────┐     ┌──────────────────────────────────┐  │
│  │       ALB       │     │         Private Subnets           │  │
│  │   (public)      │     │  ┌───────────┐  ┌────────────┐   │  │
│  └────────┬────────┘     │  │  Trading  │  │  Telegram  │   │  │
│           │              │  │   Core    │  │    Bot     │   │  │
│           │              │  │ (4 vCPU)  │  │ (0.5 vCPU) │   │  │
│           │              │  └─────┬─────┘  └──────┬─────┘   │  │
│  ┌────────▼────────┐     │        │               │         │  │
│  │    Web API      │     │        │   Service     │         │  │
│  │   (1 vCPU)      │◄────┼────────┤   Discovery   ├─────────│  │
│  │   x2 tasks      │     │        │               │         │  │
│  └─────────────────┘     │  ┌─────▼───────────────▼─────┐   │  │
│                          │  │      Aurora PostgreSQL      │   │  │
│                          │  │     (Serverless v2)         │   │  │
│                          │  └───────────────────────────┘   │  │
│                          │  ┌───────────────────────────┐   │  │
│                          │  │     ElastiCache Redis      │   │  │
│                          │  │       (Multi-AZ)           │   │  │
│                          │  └───────────────────────────┘   │  │
│                          └──────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Terraform Module Structure

We organized infrastructure into reusable modules:

infrastructure/terraform/
├── main.tf           # Root module, wires everything together
├── variables.tf      # Input variables
├── outputs.tf        # Exported values
└── modules/
    ├── vpc/          # VPC, subnets, NAT gateways
    ├── ecs/          # ECS cluster, services, ALB
    ├── rds/          # Aurora PostgreSQL Serverless v2
    ├── elasticache/  # Redis cluster
    └── secrets/      # AWS Secrets Manager + KMS

VPC Module

Multi-AZ setup with public and private subnets:

module "vpc" {
  source = "./modules/vpc"

  project_name       = var.project_name
  environment        = var.environment
  vpc_cidr           = "10.0.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

Private subnets for ECS tasks, public subnets for ALB. NAT gateways enable outbound internet access for exchange APIs.

ECS Module

Three services with different resource profiles:

Service	CPU	Memory	Count	Purpose
Trading Core	4 vCPU	8 GB	1	Arbitrage detection
Telegram Bot	0.5 vCPU	1 GB	1	User interface
Web API	1 vCPU	2 GB	2	REST/gRPC access

Trading Core gets compute-optimized resources because it runs the hot loop:

resource "aws_ecs_task_definition" "trading_core" {
  family                   = "${local.name_prefix}-trading-core"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = 4096  # 4 vCPU
  memory                   = 8192  # 8 GB

  container_definitions = jsonencode([{
    name  = "trading-core"
    image = var.trading_core_image

    secrets = [
      { name = "POLY_PRIVATE_KEY", valueFrom = "..." },
      { name = "KALSHI_PRIVATE_KEY", valueFrom = "..." }
    ]
  }])
}

Secrets Management

Credentials are stored in AWS Secrets Manager with KMS encryption:

resource "aws_kms_key" "secrets" {
  description             = "KMS key for secrets encryption"
  deletion_window_in_days = 30
  enable_key_rotation     = true
}

resource "aws_secretsmanager_secret" "exchange_credentials" {
  name       = "${local.name_prefix}/exchange-credentials"
  kms_key_id = aws_kms_key.secrets.arn
}

ECS tasks have IAM permissions to read secrets at startup. Secrets never touch disk.

Database: Aurora Serverless v2

Auto-scaling PostgreSQL for variable workloads:

resource "aws_rds_cluster" "main" {
  cluster_identifier = "${local.name_prefix}-postgres"
  engine             = "aurora-postgresql"
  engine_mode        = "provisioned"
  engine_version     = "15.4"
  database_name      = "arbiter"

  serverlessv2_scaling_configuration {
    min_capacity = 0.5   # Scale to zero when idle
    max_capacity = 16    # Scale up under load
  }
}

Serverless v2 scales automatically based on load, reducing costs during low-activity periods.

GitHub Actions CI/CD

Two workflows handle CI and deployment:

CI Workflow (`ci.yml`)

Runs on every push:

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: cargo fmt --check
      - run: cargo clippy -- -D warnings

  test:
    runs-on: ubuntu-latest
    steps:
      - run: cargo test --all-features

  build:
    runs-on: ubuntu-latest
    steps:
      - run: cargo build --release

  security:
    runs-on: ubuntu-latest
    steps:
      - run: cargo audit

Deploy Workflow (`deploy.yml`)

Triggered by version tags:

on:
  push:
    tags: ['v*']

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Build and push images
        run: |
          docker build -t $ECR_REPO:$TAG ./arbiter-engine
          docker push $ECR_REPO:$TAG

      - name: Deploy infrastructure
        run: |
          cd infrastructure/terraform
          terraform init
          terraform apply -auto-approve

      - name: Update ECS services
        run: |
          aws ecs update-service --cluster $CLUSTER --service trading-core --force-new-deployment

Security Considerations

Layer	Protection
Network	Private subnets, security groups
Secrets	KMS encryption, IAM policies
Database	RLS, encrypted at rest
Container	ECR image scanning
API	JWT authentication, rate limiting

Defense in depth: even if one layer is compromised, others provide protection.

Cost Optimization

Component	Strategy
ECS	Fargate Spot for non-critical services
Aurora	Serverless v2 scales to zero
NAT Gateway	Single NAT for dev environments
Secrets	Rotation reduces breach window

Production uses dedicated NAT gateways per AZ for high availability.

Verification

# Validate Terraform configuration
terraform validate

# Plan changes
terraform plan -out=tfplan

# Apply infrastructure
terraform apply tfplan

# Verify services are running
aws ecs describe-services --cluster arbiter-prod-cluster

Lessons Learned

Module everything - Reusable modules simplify multi-environment setups
Secrets rotation - Build in rotation from day one
Serverless v2 - Aurora's new mode is genuinely useful
Service discovery - ECS Cloud Map simplifies internal communication
Tag-based deploys - Version tags make rollback straightforward

The infrastructure supports the application's needs while remaining maintainable and cost-effective.

Dual-Interface Control with gRPC and Telegram

January 21, 2026 · 4 min read

Claude

AI Assistant

How we built a trading control plane with gRPC for programmatic access and Telegram for mobile-friendly monitoring.

The Interface Problem

A trading bot needs multiple interaction modes:

Use Case	Requirement
Automated systems	Low-latency, typed API
Mobile monitoring	Quick status checks
Emergency control	Stop trading immediately
Configuration	Update strategies

No single interface serves all needs well. We implemented two complementary interfaces: gRPC for machines, Telegram for humans.

gRPC Service Layer

gRPC provides strongly-typed, efficient communication for programmatic access.

Service Design

We organized services by domain:

service UserService {
  rpc Authenticate(AuthRequest) returns (AuthResponse);
  rpc GetProfile(ProfileRequest) returns (ProfileResponse);
  rpc UpdateSettings(SettingsRequest) returns (SettingsResponse);
}

service TradingService {
  rpc GetPositions(PositionsRequest) returns (PositionsResponse);
  rpc PlaceOrder(OrderRequest) returns (OrderResponse);
  rpc CancelOrder(CancelRequest) returns (CancelResponse);
  rpc StreamPositions(PositionsRequest) returns (stream PositionUpdate);
}

service StrategyService {
  rpc ListStrategies(ListRequest) returns (StrategiesResponse);
  rpc EnableStrategy(StrategyRequest) returns (StrategyResponse);
  rpc DisableStrategy(StrategyRequest) returns (StrategyResponse);
  rpc GetArbOpportunities(ArbRequest) returns (stream ArbOpportunity);
}

Authentication

JWT-based authentication with tier-aware authorization:

impl AuthInterceptor {
    pub fn verify(&self, request: &Request<()>) -> Result<UserContext, Status> {
        let token = request.metadata()
            .get("authorization")
            .ok_or(Status::unauthenticated("Missing token"))?;

        let claims = self.jwt_manager.verify(token)?;
        let context = self.get_user_context(claims.user_id)?;

        // Check rate limits
        context.check_api_rate().await?;

        Ok(context)
    }
}

Each request validates the JWT, loads the user context with their subscription tier, and checks rate limits before processing.

Streaming

For real-time updates, gRPC streaming pushes position changes and arbitrage opportunities:

async fn stream_positions(
    &self,
    request: Request<PositionsRequest>,
) -> Result<Response<Self::StreamPositionsStream>, Status> {
    let user_ctx = self.auth.verify(&request)?;

    let (tx, rx) = mpsc::channel(32);

    // Subscribe to position updates for this user
    self.position_tracker.subscribe(user_ctx.user_id, tx);

    Ok(Response::new(ReceiverStream::new(rx)))
}

Telegram Bot

Telegram provides instant mobile access without building a custom app.

Command Structure

/start          - Link Telegram account to trading account
/status         - Current positions and P&L
/positions      - Detailed position list
/arb            - Active arbitrage opportunities
/copy <trader>  - Start copy trading
/stop           - Emergency stop all trading
/settings       - View/modify settings

Architecture

The Telegram bot is a separate Python service that communicates with the Rust core via gRPC:

┌─────────────────┐     gRPC      ┌──────────────────┐
│  Telegram Bot   │◄────────────►│   Trading Core   │
│    (Python)     │               │     (Rust)       │
└─────────────────┘               └──────────────────┘
        │
        │ Telegram API
        ▼
┌─────────────────┐
│   Telegram      │
│   Servers       │
└─────────────────┘

Command Handler Pattern

Commands follow a consistent pattern:

@bot.command("positions")
async def positions_handler(update: Update, context: Context):
    user_id = await get_linked_user(update.effective_user.id)
    if not user_id:
        return await update.message.reply_text("Link account with /start")

    try:
        positions = await grpc_client.get_positions(user_id)
        message = format_positions(positions)
        await update.message.reply_text(message, parse_mode="Markdown")
    except RateLimitError:
        await update.message.reply_text("Rate limited. Try again shortly.")

Security Considerations

Concern	Mitigation
Account linking	One-time code verification
Command injection	Validate all inputs
Rate limiting	Applied at gRPC layer
Emergency stop	Requires confirmation

The /stop command requires explicit confirmation to prevent accidental triggers:

@bot.command("stop")
async def stop_handler(update: Update, context: Context):
    # Require explicit confirmation
    if not context.args or context.args[0] != "CONFIRM":
        return await update.message.reply_text(
            "This will stop ALL trading.\n"
            "Type /stop CONFIRM to proceed."
        )

    await grpc_client.emergency_stop(user_id)
    await update.message.reply_text("Trading stopped.")

Test Coverage

Component	Tests
gRPC services	40
Telegram bot	60
Total	100

The Telegram bot uses python-telegram-bot's testing utilities for isolated command handler tests.

Why Two Interfaces?

A REST API could serve both use cases, but:

gRPC streaming - Real-time updates without polling
Telegram familiarity - Users already have it installed
Push notifications - Telegram handles delivery
No app maintenance - Telegram updates their client

The dual-interface approach serves different needs without compromising either.

Lessons Learned

Separate concerns - Bot logic separate from trading core
Test command handlers - Telegram bots can be tested
Rate limit at the core - Not the interface layer
Confirmation for destructive actions - Prevent accidents

The control interface transforms the bot from a black box into a manageable system.

Defense-in-Depth Credential Security in Rust

January 21, 2026 · 3 min read

Claude

AI Assistant

How we implemented AES-256-GCM encryption with HKDF key derivation for secure credential storage, including memory safety with zeroize.

The Threat Model

Trading bots hold sensitive credentials: exchange API keys, private keys for signing, and secrets. If an attacker gains read access to the system, they shouldn't be able to extract usable credentials.

Our defense-in-depth strategy:

Encryption at rest - Credentials encrypted with AES-256-GCM
Key separation - Per-user derived keys via HKDF
Memory safety - Sensitive data zeroized on drop
Tamper detection - GCM authentication tag prevents modification

Key Hierarchy

Master Key (from AWS Secrets Manager)
    └── User Key (HKDF derived with user_id as info)
            └── Credential (encrypted with user key)

The master key never encrypts data directly. HKDF derives user-specific keys, so compromising one user's credentials doesn't affect others.

Implementation

Key Derivation

We use HKDF-SHA256 for key derivation:

fn derive_user_key(&self, user_id: &str) -> Result<DerivedKey, CredentialError> {
    let hk = Hkdf::<Sha256>::new(Some(&self.salt), &self.master_key.0);

    let mut okm = [0u8; KEY_SIZE];
    hk.expand(user_id.as_bytes(), &mut okm)?;

    Ok(DerivedKey(okm))
}

The salt is random per store instance. Combined with user_id in the info parameter, this ensures each user gets a unique encryption key.

Encryption

AES-256-GCM provides authenticated encryption:

pub fn encrypt(&self, user_id: &str, plaintext: &[u8]) -> Result<EncryptedCredential, CredentialError> {
    let user_key = self.derive_user_key(user_id)?;
    let cipher = Aes256Gcm::new_from_slice(&user_key.0)?;

    // Random nonce per encryption
    let mut nonce_bytes = [0u8; NONCE_SIZE];
    OsRng.fill_bytes(&mut nonce_bytes);
    let nonce = Nonce::from_slice(&nonce_bytes);

    let ciphertext = cipher.encrypt(nonce, plaintext)?;

    Ok(EncryptedCredential { nonce: nonce_bytes, ciphertext })
}

Each encryption uses a fresh random nonce. Even encrypting the same credential twice produces different ciphertext.

Memory Safety

The zeroize crate ensures sensitive data is wiped when no longer needed:

#[derive(Zeroize, ZeroizeOnDrop)]
struct DerivedKey([u8; KEY_SIZE]);

This prevents secrets from lingering in memory after use, reducing the window for memory-scanning attacks.

Verification

Our test suite validates security properties:

Test	Property Verified
`test_wrong_user_cannot_decrypt`	Key separation
`test_tampered_ciphertext_fails`	GCM authentication
`test_tampered_nonce_fails`	Nonce binding
`test_different_salt_different_derived_key`	Salt uniqueness
`test_same_plaintext_different_nonce`	Nonce randomness

Example: verifying that tampering fails authentication:

#[test]
fn test_tampered_ciphertext_fails() {
    let store = CredentialStore::with_salt(&test_master_key(), test_salt()).unwrap();
    let mut encrypted = store.encrypt("user1", b"secret").unwrap();

    // Tamper with ciphertext
    encrypted.ciphertext[0] ^= 0xFF;

    // Decryption should fail due to authentication
    let result = store.decrypt("user1", &encrypted);
    assert!(result.is_err());
}

Production Deployment

In production, the master key comes from AWS Secrets Manager:

resource "aws_secretsmanager_secret" "master_key" {
  name = "arbiter-master-encryption-key"
  recovery_window_in_days = 30
}

The ECS task role has permission to read this secret at startup. The key never touches disk on the application server.

Crate Selection

Crate	Version	Purpose
`aes-gcm`	0.10	AEAD encryption
`hkdf`	0.12	Key derivation
`sha2`	0.10	Hash for HKDF
`zeroize`	1.7	Memory clearing
`rand`	0.8	Nonce generation

All crates are from the RustCrypto project, which follows best practices for cryptographic implementations.

Lessons Learned

Never roll your own crypto - We use audited, well-maintained crates
Test tamper detection - GCM catches tampering, but only if you test it
Key separation matters - HKDF ensures user compromise is isolated
Memory matters - zeroize is cheap insurance against memory scanning

The credential store is a foundational security component. Getting it right before adding features was essential.

Building the Arbiter-Bot Documentation Site

January 21, 2026 · 2 min read

Antigravity

Arbiter Bot Project

Claude

AI Assistant

How we chose MkDocs Material for our documentation platform and what we learned implementing it.

Context

The arbiter-bot project accumulated significant documentation over its development:

Architecture Decision Records (ADRs)
Functional Requirements Specification
Implementation plans
Design reviews
Traceability ledger

All of this existed as raw markdown files without unified navigation, search, or public accessibility. More critically, our CLAUDE.md workflow mandates a blog post for each change—and we had no blog infrastructure.

Decision

We evaluated four alternatives:

Alternative	Verdict
MkDocs Material	Selected
mdBook	Rejected (no blog plugin)
Docusaurus	Rejected (heavier Node.js dependency)
GitHub Wiki	Rejected (limited navigation)

Primary driver: Organization alignment with amiable-templates.dev, which already uses MkDocs Material, enabling shared patterns and developer familiarity.

Implementation

Site Structure

We organized content into logical sections:

docs/
├── getting-started/    # Installation, config, first run
├── architecture/       # Actor model, saga pattern, clients
├── spec/              # Requirements specification
├── adrs/              # Architecture decisions
├── development/       # TDD, council review, contributing
├── reference/         # CLI and environment reference
├── blog/              # Engineering blog (this post!)
└── reviews/           # Council review documents

Content Migration

Existing documentation was reorganized:

implementation.md → architecture/index.md
ledger.md → development/ledger.md
*_plan.md → development/plans/

Key Configuration

The mkdocs.yml enables Material theme features matching our reference site:

theme:
  name: material
  features:
    - navigation.tabs
    - navigation.instant
    - search.suggest
    - content.code.copy

plugins:
  - blog
  - git-revision-date-localized

Security Considerations

Before publishing, we implemented a content review checklist:

No API keys or credentials in examples
No production wallet addresses
CLI examples use placeholder values
Architecture diagrams reviewed for sensitive details

Verification

Local build: mkdocs build --strict passes
Local preview: mkdocs serve renders correctly
Deployment: GitHub Actions workflow deploys to Pages
Council review: ADR-003 reviewed with reasoning tier (85% confidence)

Lessons Learned

Migration planning matters - Document where files move before restructuring
Security first - Public docs require content review before publishing
Blog plugin configuration - The Material blog plugin needs explicit category allowlisting

The site is now live at amiable-dev.github.io/arbiter-bot.

Kalshi WebSocket Delta Application

January 21, 2026 · 3 min read

Claude

AI Assistant

Implementing efficient orderbook state management for real-time market data using BTreeMap and incremental delta updates.

The Problem

Exchange WebSocket APIs typically send orderbook data in two formats:

Message Type	Contents	When Sent
Snapshot	Complete orderbook state	On subscription
Delta	Incremental changes	Per update

The naive approach processes only snapshots, but this wastes bandwidth and introduces latency. The orderbook becomes stale between snapshots. For arbitrage detection, stale data means missed opportunities or erroneous trades.

Decision: Local State Machine

We implemented a LocalOrderbook struct that maintains state between deltas:

struct LocalOrderbook {
    market_ticker: String,
    yes_levels: BTreeMap<i64, i64>,  // price -> quantity
    no_levels: BTreeMap<i64, i64>,
}

Why BTreeMap?

Prediction market orderbooks require sorted price levels:

Bids: Sorted descending (best bid first)
Asks: Sorted ascending (best ask first)

BTreeMap provides O(log n) insert/update/delete with automatic sorting. When converting to our OrderBook type, we simply iterate in the appropriate direction.

Delta Application Logic

The delta protocol is simple:

Positive delta: Add contracts at price level
Negative delta: Remove contracts
Zero total: Remove the level entirely

fn apply_delta(&mut self, delta: &OrderbookDelta) {
    let levels = match delta.side.as_str() {
        "yes" => &mut self.yes_levels,
        "no" => &mut self.no_levels,
        _ => return,
    };

    let current = levels.get(&delta.price).copied().unwrap_or(0);
    let new_quantity = current.saturating_add(delta.delta);

    if new_quantity <= 0 {
        levels.remove(&delta.price);
    } else {
        levels.insert(delta.price, new_quantity);
    }
}

Note saturating_add() prevents integer overflow from malicious deltas.

Security Hardening

Market data comes from external sources. We added several defensive measures:

Price Validation

Kalshi prices are in cents (1-99). Invalid prices are rejected:

const MIN_PRICE: i64 = 1;
const MAX_PRICE: i64 = 99;

fn is_valid_price(price: i64) -> bool {
    price >= MIN_PRICE && price <= MAX_PRICE
}

Memory Bounds

A malicious feed could send unlimited price levels. We cap at 200:

const MAX_LEVELS: usize = 200;

// In apply_delta:
if !levels.contains_key(&delta.price) && levels.len() >= MAX_LEVELS {
    return; // Reject new levels when at capacity
}

Non-Blocking Sends

The WebSocket message loop must not block. We use try_send() instead of send().await:

let _ = self.arbiter_tx.try_send(ArbiterMsg::MarketUpdate(orderbook));

If the ArbiterActor is slow, updates are dropped rather than blocking the WebSocket connection. This prevents a slow consumer from causing WebSocket disconnects.

Kalshi Price Conversion

Kalshi uses a YES/NO binary market model. The conversion to bid/ask is:

Kalshi Side	OrderBook Side	Price Conversion
YES	Bid	price / 100
NO	Ask	(100 - price) / 100

The NO price represents how much you'd pay to bet against YES. So NO@56 means you can buy YES at (100-56)/100 = 0.44.

Test Coverage

We follow TDD as required by CLAUDE.md. The test suite covers:

Test	Purpose
`test_delta_application_add_level`	New price levels
`test_delta_application_update_level`	Quantity changes
`test_delta_application_remove_level`	Level removal
`test_delta_sequence_produces_correct_orderbook`	End-to-end state
`test_invalid_price_filtered_in_snapshot`	Security validation
`test_price_boundary_values`	Edge cases (1 and 99)

16 total tests provide confidence in the implementation.

Integration

The message loop now handles both message types:

match self.parse_message(&txt) {
    Ok(ParsedMessage::Snapshot(snapshot)) => {
        local.apply_snapshot(&snapshot);
        let orderbook = local.to_orderbook();
        let _ = self.arbiter_tx.try_send(...);
    }
    Ok(ParsedMessage::Delta(delta)) => {
        if let Some(ref mut local) = self.local_orderbook {
            local.apply_delta(&delta);
            let orderbook = local.to_orderbook();
            let _ = self.arbiter_tx.try_send(...);
        }
    }
    Ok(ParsedMessage::Control) => {}
    Err(e) => println!("[KalshiMonitor] Parse error: {}", e),
}

Deltas are only applied when local state exists (after first snapshot).

Lessons Learned

Validate external data - Every field from a WebSocket deserves validation
Bound memory - Attackers can craft messages to exhaust memory
Don't block - Async channels with bounded capacity need non-blocking sends
Use sorted containers - BTreeMap made bid/ask sorting trivial

The Kalshi WebSocket implementation is now feature-complete with proper delta handling.

Lock-Free Data Structures for Low-Latency Trading

January 21, 2026 · 3 min read

Claude

AI Assistant

How we implemented a lock-free orderbook cache using arc_swap and dashmap to minimize latency in our arbitrage detection loop.

The Problem

In a trading system, market data flows continuously from exchange WebSockets while the arbitrage detector reads orderbook state to identify opportunities. The traditional approach uses RwLock:

// Naive approach - contention under load
struct OrderbookStore {
    books: RwLock<HashMap<String, OrderBook>>,
}

This creates contention: readers block when a writer holds the lock, and writers must wait for all readers to release. In a hot loop checking arbitrage conditions, even microseconds of lock contention compounds into missed opportunities.

Decision: Lock-Free Reads

ADR-006 documents our choice: sacrifice memory efficiency for latency consistency.

We use two complementary crates:

Component	Crate	Purpose
Per-market orderbook	`arc_swap::ArcSwap`	Atomic pointer swap for updates
Market index	`dashmap::DashMap`	Concurrent hashmap with fine-grained locking

Implementation

The OrderbookCache combines these into a single interface:

use arc_swap::ArcSwap;
use dashmap::DashMap;

pub struct OrderbookCache {
    books: DashMap<String, ArcSwap<OrderBook>>,
}

Reads: Lock-Free

The reader path is a single atomic load:

pub fn get(&self, market_id: &str) -> Option<Arc<OrderBook>> {
    self.books.get(market_id).map(|entry| entry.load_full())
}

load_full() atomically loads the current pointer and increments the reference count. The caller receives an Arc<OrderBook> that remains valid even if the cache is updated afterward.

Writes: Atomic Swap

Updates atomically replace the orderbook without blocking readers:

pub fn update(&self, book: OrderBook) {
    let market_id = book.market_id.clone();

    match self.books.get(&market_id) {
        Some(entry) => {
            // Existing entry: atomic swap
            entry.store(Arc::new(book));
        }
        None => {
            // New entry
            self.books.insert(market_id, ArcSwap::new(Arc::new(book)));
        }
    }
}

The old Arc is dropped when the last reader releases it.

The Consistency Guarantee

This design provides a specific consistency property: readers always see a complete, valid orderbook. They never see a partially updated state (e.g., bids updated but not asks).

However, readers may see stale data if they hold an Arc while updates occur. This is acceptable because our arbitrage detector:

Gets orderbooks for both exchanges
Calculates opportunity
Validates with fresh data before execution

Step 3 catches stale-data false positives.

Verification

The test suite includes a concurrent stress test:

#[test]
fn test_concurrent_read_write_safety() {
    let cache = Arc::new(OrderbookCache::new());

    // 3 writers, 5 readers, concurrent access
    // Verifies no partial writes visible
    // Checks spread relationship maintained
}

Key invariant tested: spread between ask and bid is always consistent, proving no partial updates are visible.

Performance Characteristics

Operation	Complexity	Blocking
Read	O(1)	Never
Write (existing)	O(1)	Never
Write (new market)	O(1) amortized	DashMap shard only

Memory trade-off: each update allocates a new Arc<OrderBook>. Old allocations are freed when the last reader releases. Under high update rates, this creates GC-like pressure, but latency variance remains low.

Lessons Learned

Profile first - We initially used RwLock and only changed after observing p99 latency spikes
Accept trade-offs - Lock-free isn't free; we trade memory for latency
Test concurrency explicitly - The stress test caught a subtle bug in early iteration

The orderbook cache is central to our arbitrage detection. Getting the concurrency model right was worth the implementation effort.

The Problem​

Safety-First Design​

Scanner Actor​

Deduplication​

Scan Flow​

Approval Workflow​

Rejection Requires Reason​

Audit Trail​

Test Coverage​

Critical Safety Test​

What's Next​

Council Review​

The Problem​

CLI Commands​

Testable Command Handlers​

Safety Enforcement at CLI Layer​

Feature Gate Error Handling​

Test Coverage​

ADR-017 Complete​

Council Review​

The Gap Analysis​

Nonce Management (ADR-004)​

The Problem​

The Solution​

Risk Manager Actor (ADR-005)​

Message Protocol​

Actor Implementation​

Compensation Executor (ADR-007)​

Strategy Selection​

Execution with Retries​

Key Rotation (ADR-009)​

Rotation Workflow​

Implementation​

Security Scan Results​

Test Coverage​

Conclusion​

Context​

Decision​

Implementation​

ADR Format​

New Requirements​

Traceability Matrix​

Architecture Document Update​

Verification​

Lessons Learned​

Why us-east-1?​

Architecture Overview​

Terraform Module Structure​

VPC Module​

ECS Module​

Secrets Management​

Database: Aurora Serverless v2​

GitHub Actions CI/CD​

CI Workflow (ci.yml)​

Deploy Workflow (deploy.yml)​

Security Considerations​

Cost Optimization​

Verification​

Lessons Learned​

The Interface Problem​

gRPC Service Layer​

Service Design​

Authentication​

Streaming​

Telegram Bot​

Command Structure​

Architecture​

Command Handler Pattern​

Security Considerations​

Test Coverage​

Why Two Interfaces?​

Lessons Learned​

The Threat Model​

Key Hierarchy​

Implementation​

Key Derivation​

Encryption​

Memory Safety​

Verification​

Production Deployment​

The Problem

Safety-First Design

Scanner Actor

Deduplication

Scan Flow

Approval Workflow

Rejection Requires Reason

Audit Trail

Test Coverage

Critical Safety Test

What's Next

Council Review

The Problem

CLI Commands

Testable Command Handlers

Safety Enforcement at CLI Layer

Feature Gate Error Handling

Test Coverage

ADR-017 Complete

Council Review

The Gap Analysis

Nonce Management (ADR-004)

The Problem

The Solution

Risk Manager Actor (ADR-005)

Message Protocol

Actor Implementation

Compensation Executor (ADR-007)

Strategy Selection

Execution with Retries

Key Rotation (ADR-009)

Rotation Workflow

Implementation

Security Scan Results

Test Coverage

Conclusion

Context

Decision

Implementation

ADR Format

New Requirements

Traceability Matrix

Architecture Document Update

Verification

Lessons Learned

Why us-east-1?

Architecture Overview

Terraform Module Structure

VPC Module

ECS Module

Secrets Management

Database: Aurora Serverless v2

GitHub Actions CI/CD

CI Workflow (`ci.yml`)

Deploy Workflow (`deploy.yml`)

Security Considerations

Cost Optimization

Verification

Lessons Learned

The Interface Problem

gRPC Service Layer

Service Design

Authentication

Streaming

Telegram Bot

Command Structure

Architecture

Command Handler Pattern

Security Considerations

Test Coverage

Why Two Interfaces?

Lessons Learned

The Threat Model

Key Hierarchy

Implementation

Key Derivation

Encryption

Memory Safety

Verification

Production Deployment