Skip to main content

ADR-025: Future Integration Capabilities

Status: APPROVED WITH MODIFICATIONS Date: 2025-12-23 Decision Makers: Engineering, Architecture Council Review: Completed - Reasoning Tier (4/4 models: GPT-4o, Gemini-3-Pro, Claude Opus 4.5, Grok-4)


Context

Industry Landscape Analysis (December 2025)

The AI/LLM industry has undergone significant shifts in 2025. This ADR assesses whether LLM Council's current architecture aligns with these developments and proposes a roadmap for future integrations.

1. Agentic AI is the Dominant Paradigm

2025 has been declared the "Year of the Agent" by industry analysts:

MetricValue
Market size (2024)$5.1 billion
Projected market (2030)$47 billion
Annual growth rate44%
Enterprise adoption (2025)25% deploying AI agents

Key Frameworks Emerged:

  • LangChain - Modular LLM application framework
  • AutoGen (Microsoft) - Multi-agent conversation framework
  • OpenAI Agents SDK - Native agent development
  • n8n - Workflow automation with LLM integration
  • Claude Agent SDK - Anthropic's agent framework

Implications for LLM Council:

  • Council deliberation is a form of multi-agent consensus
  • Our 3-stage process (generate → review → synthesize) maps to agent workflows
  • Opportunity to position as "agent council" for high-stakes decisions

2. MCP Has Become the Industry Standard

The Model Context Protocol (MCP) has achieved widespread adoption:

MilestoneDate
Anthropic announces MCPNovember 2024
OpenAI adopts MCPMarch 2025
Google confirms Gemini supportApril 2025
Donated to Linux FoundationDecember 2025

November 2025 Spec Features:

  • Parallel tool calls
  • Server-side agent loops
  • Task abstraction for long-running work
  • Enhanced capability declarations

LLM Council's Current MCP Status:

  • ✅ MCP server implemented (mcp_server.py)
  • ✅ Tools: consult_council, council_health_check
  • ✅ Progress reporting during deliberation
  • ❓ Missing: Parallel tool call support, task abstraction

3. Local LLM Adoption is Accelerating

Privacy and compliance requirements are driving on-premises LLM deployment:

Drivers:

  • GDPR, HIPAA compliance requirements
  • Data sovereignty concerns
  • Reduced latency for real-time applications
  • Cost optimization for high-volume usage

Standard Tools:

  • Ollama: De facto standard for local LLM hosting

    • Simple API: http://localhost:11434/v1/chat/completions
    • OpenAI-compatible format
    • Supports Llama, Mistral, Mixtral, Qwen, etc.
  • LiteLLM: Unified gateway for 100+ providers

    • Acts as AI Gateway/Proxy
    • Includes Ollama support
    • Cost tracking, guardrails, load balancing

LLM Council's Current Local LLM Status:

  • ❌ No native Ollama support
  • ❌ No LiteLLM integration
  • ✅ Gateway abstraction exists (could add OllamaGateway)

4. Workflow Automation Integrates LLMs Natively

Workflow tools now treat LLMs as first-class citizens:

n8n Capabilities (2025):

  • Direct Ollama node for local LLMs
  • AI Agent node for autonomous workflows
  • 422+ app integrations
  • RAG pipeline templates
  • MCP server connections

Integration Patterns:

Trigger → LLM Decision → Action → Webhook Callback

LLM Council's Current Workflow Status:

  • ✅ HTTP REST API (POST /v1/council/run)
  • ✅ Health endpoint (GET /health)
  • ❌ No webhook callbacks (async notifications)
  • ❌ No streaming API for real-time progress

Current Capabilities Assessment

Gateway Layer (ADR-023)

GatewayStatusDescription
OpenRouterGateway✅ Complete100+ models via single key
RequestyGateway✅ CompleteBYOK with analytics
DirectGateway✅ CompleteAnthropic, OpenAI, Google direct
OllamaGateway✅ CompleteLocal LLM support via LiteLLM (ADR-025a)
LiteLLMGateway❌ DeferredIntegrated into OllamaGateway per council recommendation

External Integrations

IntegrationStatusGap
MCP Server✅ CompleteConsider task abstraction
HTTP API✅ CompleteWebhooks and SSE added (ADR-025a)
CLI✅ CompleteNone
Python SDK✅ CompleteNone
Webhooks✅ CompleteEvent-based with HMAC (ADR-025a)
SSE Streaming✅ CompleteReal-time events (ADR-025a)
n8n⚠️ IndirectExample template needed
NotebookLM❌ N/AThird-party tool

Agentic Capabilities

CapabilityStatusNotes
Multi-model deliberation✅ Core featureOur primary value
Peer review (bias reduction)✅ Stage 2Anonymized review
Consensus synthesis✅ Stage 3Chairman model
Fast-path routing✅ ADR-020Single-model optimization
Local execution✅ CompleteOllamaGateway via LiteLLM (ADR-025a)

Proposed Integration Roadmap

Priority Assessment

IntegrationPriorityEffortImpactRationale
OllamaGatewayHIGHMediumHighPrivacy/compliance demand
Webhook callbacksMEDIUMLowMediumWorkflow tool integration
Streaming APIMEDIUMMediumMediumReal-time UX
LiteLLM integrationLOWLowMediumAlternative to native gateway
Enhanced MCPLOWMediumLowSpec still evolving

Phase 1: Local LLM Support (OllamaGateway)

Objective: Enable fully local council execution

Implementation:

# src/llm_council/gateway/ollama.py
class OllamaGateway(BaseRouter):
"""Gateway for local Ollama models."""

def __init__(
self,
base_url: str = "http://localhost:11434",
default_timeout: float = 120.0,
):
self.base_url = base_url
self.default_timeout = default_timeout

async def complete(self, request: GatewayRequest) -> GatewayResponse:
# Ollama uses OpenAI-compatible format
endpoint = f"{self.base_url}/v1/chat/completions"
# ... implementation

Model Identifier Format:

ollama/llama3.2
ollama/mistral
ollama/mixtral
ollama/qwen2.5

Configuration:

# Use Ollama for all council models
LLM_COUNCIL_DEFAULT_GATEWAY=ollama
LLM_COUNCIL_OLLAMA_BASE_URL=http://localhost:11434

# Or mix cloud and local
LLM_COUNCIL_MODEL_ROUTING='{"ollama/*": "ollama", "anthropic/*": "direct"}'

Fully Local Council Example:

# llm_council.yaml
council:
tiers:
pools:
local:
models:
- ollama/llama3.2
- ollama/mistral
- ollama/qwen2.5
timeout_seconds: 300
peer_review: standard

chairman: ollama/mixtral

gateways:
default: ollama
providers:
ollama:
enabled: true
base_url: http://localhost:11434

Phase 2: Workflow Integration (Webhooks)

Objective: Enable async notifications for n8n and similar tools

API Extension:

class CouncilRequest(BaseModel):
prompt: str
models: Optional[List[str]] = None
# New fields
webhook_url: Optional[str] = None
webhook_events: List[str] = ["complete", "error"]
async_mode: bool = False # Return immediately, notify via webhook

Webhook Payload:

{
"event": "council.complete",
"request_id": "uuid",
"timestamp": "2025-12-23T10:00:00Z",
"result": {
"stage1": [...],
"stage2": [...],
"stage3": {...}
}
}

Events:

  • council.started - Deliberation begins
  • council.stage1.complete - Individual responses collected
  • council.stage2.complete - Peer review complete
  • council.complete - Final synthesis ready
  • council.error - Execution failed

Phase 3: LiteLLM Alternative Path

Objective: Leverage existing gateway ecosystem instead of building native

Approach: Instead of building OllamaGateway, point DirectGateway at LiteLLM proxy:

# LiteLLM acts as unified gateway
export LITELLM_PROXY_URL=http://localhost:4000

# DirectGateway routes through LiteLLM
LLM_COUNCIL_DIRECT_ENDPOINT=http://localhost:4000/v1/chat/completions

Trade-offs:

ApproachProsCons
Native OllamaGatewaySimpler, no dependenciesOnly supports Ollama
LiteLLM integration100+ providers, cost trackingExternal dependency

Recommendation: Implement OllamaGateway first (simpler), document LiteLLM as alternative.


Open Questions for Council Review

1. Local LLM Priority

Should OllamaGateway be the top priority given the industry trend toward local/private LLM deployment?

Context: Privacy regulations (GDPR, HIPAA) and data sovereignty concerns are driving enterprises to on-premises LLM deployment. Ollama has become the de facto standard.

2. LiteLLM vs Native Gateway

Should we integrate with LiteLLM (100+ provider support) or build a native Ollama gateway?

Trade-offs:

  • LiteLLM: Instant access to 100+ providers, maintained by external team, adds dependency
  • Native: Simpler, no dependencies, but only supports Ollama initially

3. Webhook Architecture

What webhook patterns best support n8n and similar workflow tools?

Options:

  • A) Simple POST callback with full result
  • B) Event-based with granular stage notifications
  • C) WebSocket for real-time streaming
  • D) Server-Sent Events (SSE) for progressive updates

4. Fully Local Council Feasibility

Is there demand for running the entire council locally (all models + chairman via Ollama)?

Considerations:

  • Hardware requirements (multiple concurrent models)
  • Quality trade-offs (local vs cloud models)
  • Use cases (air-gapped environments, development/testing)

5. Agentic Positioning

Should LLM Council position itself as an "agent council" for high-stakes agentic decisions?

Opportunity: Multi-agent systems need consensus mechanisms. LLM Council's deliberation could serve as a "jury" for agent decisions requiring human-level judgment.


Implementation Timeline

PhaseScopeDurationDependencies
Phase 1aOllamaGateway basic1 sprintNone
Phase 1bFully local council1 sprintPhase 1a
Phase 2Webhook callbacks1 sprintNone
Phase 3LiteLLM docs0.5 sprintNone

Success Metrics

MetricTargetMeasurement
Local council executionWorks with OllamaIntegration tests pass
Webhook delivery<1s latencyP95 latency measurement
n8n integrationDocumented workflowExample template works
Council quality (local)>80% agreement with cloudA/B comparison

References


Council Review

Status: APPROVED WITH ARCHITECTURAL MODIFICATIONS Date: 2025-12-23 Tier: High (Reasoning) Sessions: 3 deliberation sessions conducted Final Session: Full council (4/4 models responded)

ModelStatusLatency
GPT-4o✓ ok15.8s
Gemini-3-Pro-Preview✓ ok31.6s
Claude Opus 4.5✓ ok48.5s
Grok-4✓ ok72.6s

Executive Summary

The full Council APPROVES the strategic direction of ADR-025, specifically the shift toward Privacy-First (Local) and Agentic Orchestration. However, significant architectural modifications are required:

  1. "Jury Mode" is the Killer Feature - Every reviewer identified this as the primary differentiator
  2. Unified Gateway Approach - Strong dissent (Gemini/Claude) against building proprietary native gateway
  3. Scope Reduction Required - Split into ADR-025a (Committed) and ADR-025b (Exploratory)
  4. Quality Degradation Notices - Required for local model usage

Council Verdicts by Question

1. Local LLM Priority: YES - TOP PRIORITY (Unanimous)

Both models agree that OllamaGateway must be the top priority.

Rationale:

  • Addresses immediate enterprise requirements for privacy (GDPR/HIPAA)
  • Avoids cloud costs and API rate limits
  • The $5.1B → $47B market growth in agentic AI relies heavily on secure, offline capabilities
  • This is a foundational feature for regulated sectors (healthcare, finance)

Council Recommendation: Proceed immediately with OllamaGateway implementation.

2. Integration Strategy: UNIFIED GATEWAY (Split Decision)

Significant Dissent on Implementation Approach:

ModelPosition
GPT-4oNative gateway for control
Grok-4Native gateway, LiteLLM as optional module
GeminiDISSENT: Use LiteLLM as engine, not custom build
ClaudeDISSENT: Start with LiteLLM as bridge, build native when hitting limitations

Gemini's Argument (Strong Dissent):

"Do not build a proprietary Native Gateway. In late 2025, maintaining a custom adapter layer for the fragmenting model market is a waste of resources. Use LiteLLM as the engine for your 'OllamaGateway' - it already standardizes headers for Ollama, vLLM, OpenAI, and Anthropic."

Claude's Analysis:

FactorNative GatewayLiteLLM
Maintenance burdenHigherLower
Dependency riskNoneMedium
Feature velocitySelf-controlledDependent
Initial dev time~40 hours~8 hours

Chairman's Synthesis: Adopt a Unified Gateway approach:

  • Wrap LiteLLM inside the Council's "OllamaGateway" interface
  • Satisfies user need for "native" experience without maintenance burden
  • Move LiteLLM from LOW to CORE priority

3. Webhook Architecture: HYBRID B + D (EVENT-BASED + SSE) (Unanimous)

Strong agreement that Event-based granular notifications combined with SSE for streaming is the superior choice.

Reasoning:

  • Simple POSTs (Option A) lack flexibility for multi-stage processes
  • WebSockets (Option C) are resource-heavy (persistent connections)
  • Event-based (B): Enables granular lifecycle tracking
  • SSE (D): Lightweight unidirectional streaming, perfect for text generation

Chairman's Decision: Implement Event-Based Webhooks as default, with optional SSE for real-time token streaming.

Recommended Webhook Events:

council.deliberation_start
council.stage1.complete
model.vote_cast
council.stage2.complete
consensus.reached
council.complete
council.error

Payload Requirements: Include timestamps, error codes, and metadata for n8n integration.

4. Fully Local Council: YES, WITH HARDWARE DOCUMENTATION (Unanimous)

Both models support this but urge caution regarding hardware realities.

Assessment: High-value feature for regulated industries (healthcare/finance).

Hardware Requirements (Council Consensus):

ProfileHardwareModels SupportedUse Case
Minimum8+ core CPU, 16GB RAM, SSDQuantized 7B (Llama 3.X, Mistral)Development/testing
RecommendedApple M-series Pro/Max, 32GB unifiedQuantized 7B-13B modelsSmall local council
Professional2x NVIDIA RTX 4090/5090, 64GB+ RAM70B models via offloadingFull production council
EnterpriseMac Studio 64GB+ or multi-GPU serverMultiple concurrent 70BAir-gapped deployments

Chairman's Note: Documentation must clearly state that a "Local Council" implies quantization (4-bit or 8-bit) for most users.

Recommendation: Document as an "Advanced" deployment scenario. Make "Local Mode" optional/configurable with cloud fallbacks.

5. Agentic Positioning: YES - "JURY" CONCEPT (Unanimous)

All four models enthusiastically support positioning LLM Council as a consensus mechanism for agents.

Strategy:

  • Differentiate from single-agent tools (like Auto-GPT)
  • Offer "auditable consensus" for high-stakes tasks
  • Position as "ethical decision-making" layer
  • Integrate with MCP for standardized context sharing

Unique Value Proposition: Multi-agent systems need reliable consensus mechanisms. Council deliberation can serve as a "jury" for decisions requiring human-level judgment.

Jury Mode Verdict Types (Gemini's Framework):

Verdict TypeUse CaseOutput
BinaryGo/no-go decisionsSingle approved/rejected verdict
Constructive DissentComplex tradeoffsMajority + minority opinion recorded
Tie-BreakerDeadlocked decisionsChairman casts deciding vote with rationale

6. Quality Degradation Notices: REQUIRED (Claude)

Claude (Evelyn) emphasized that local model usage requires explicit quality warnings:

Requirement: When using local models (Ollama), the council MUST:

  1. Detect when local models are in use
  2. Display quality degradation warning in output
  3. Offer cloud fallback option when quality thresholds not met
  4. Log quality metrics for comparison

Example Warning:

⚠️ LOCAL COUNCIL MODE
Using quantized local models. Response quality may be degraded compared to cloud models.
Models: ollama/llama3.2 (4-bit), ollama/mistral (4-bit)
For higher quality, set LLM_COUNCIL_DEFAULT_GATEWAY=openrouter

7. Scope Split Recommendation: ADR-025a vs ADR-025b

The council recommends splitting this ADR to separate committed work from exploratory:

ADR-025a (Committed) - Ship in v0.13.x:

  • OllamaGateway implementation
  • Event-based webhooks
  • Hardware documentation
  • Basic local council support

ADR-025b (Exploratory) - Research/RFC Phase:

  • Full "Jury Mode" agentic framework
  • MCP task abstraction
  • LiteLLM alternative path
  • Multi-council federation

Rationale: Prevents scope creep while maintaining ambitious vision. Core functionality ships fast; advanced features get proper RFC process.


Council-Revised Implementation Order

The models align on the following critical path:

PhaseScopeDurationPriority
Phase 1Native OllamaGateway4-6 weeksIMMEDIATE
Phase 2Event-Based Webhooks + SSE3-4 weeksHIGH
Phase 3MCP Server Enhancement2-3 weeksMEDIUM-HIGH
Phase 4Streaming API2-3 weeksMEDIUM
Phase 5Fully Local Council Mode3-4 weeksMEDIUM
Phase 6LiteLLM (optional)4-6 weeksLOW

Total Timeline: 3-6 months depending on team size.

Chairman's Detailed Roadmap (12-Week Plan)

Phase 1 (Weeks 1-4): core-native-gateway

  • Build OllamaGateway adapter with OpenAI-compatible API
  • Define Council Hardware Profiles (Low/Mid/High)
  • Risk Mitigation: Pin Ollama API versions for stability

Phase 2 (Weeks 5-8): connectivity-layer

  • Implement Event-based Webhooks with granular lifecycle events
  • Implement SSE for token streaming (lighter than WebSockets)
  • Risk Mitigation: API keys + localhost binding by default

Phase 3 (Weeks 9-12): interoperability

  • Implement basic MCP Server capability (Council as callable tool)
  • Release "Jury Mode" marketing and templates
  • Agentic positioning materials

Risks & Considerations Identified

Security Risks (Grok-4)

  • Webhooks: Introduce injection risks; implement HMAC signatures and rate limiting immediately
  • Local Models: Must be sandboxed to prevent poisoning attacks
  • Authentication: Webhook endpoints need token validation

Performance Risks (Grok-4)

  • A fully local council may crush consumer hardware
  • "Local Mode" needs to be optional/configurable
  • Consider model sharding or async processing for large councils

Compliance Risks (GPT-4o)

  • Ensure data protection standards maintained even in local deployments
  • Document compliance certifications (SOC 2) for enterprise users

Scope Creep (GPT-4o)

  • Do not let "Agentic" features distract from core Gateway stability
  • Maintain iterative development with MVPs

Ecosystem Risks (Grok-4)

  • MCP is Linux Foundation-managed; monitor for breaking changes
  • Ollama's rapid evolution might require frequent updates
  • Add integration tests for n8n/Ollama to catch regressions

Ethical/Legal Risks (Grok-4)

  • Agentic positioning could enable misuse in sensitive areas
  • Include human-in-the-loop options as safeguards
  • Ensure compliance with evolving AI transparency regulations

Council Recommendations Summary

DecisionVerdictConfidenceDissent
OllamaGateway priorityTOP PRIORITYHighNone
Native vs LiteLLMUnified Gateway (wrap LiteLLM)MediumGemini/Claude favor LiteLLM-first
Webhook architectureHybrid B+D (Event + SSE)HighNone
MCP EnhancementMEDIUM-HIGH (new)HighNone
Fully local councilYes, with hardware docsHighNone
Agentic positioningYes, as "Jury Mode"HighNone
Quality degradation noticesRequired for localHighNone
Scope split (025a/025b)RecommendedHighNone

Chairman's Closing Ruling: Proceed with ADR-025 utilizing the Unified Gateway approach (LiteLLM wrapped in native interface). Revise specifications to include:

  1. Strict webhook payload definitions with HMAC authentication
  2. Dedicated workstream for hardware benchmarking
  3. Quality degradation notices for local model usage
  4. Scope split into ADR-025a (committed) and ADR-025b (exploratory)

Architectural Principles Established

  1. Privacy First: Local deployment is a foundational capability, not an afterthought
  2. Lean Dependencies: Prefer native implementations over external dependencies
  3. Progressive Enhancement: Start with event-based webhooks, add streaming later
  4. Hardware Transparency: Document requirements clearly for local deployments
  5. Agentic Differentiation: Position as consensus mechanism for multi-agent systems

Action Items

Based on council feedback (3 deliberation sessions, 4 models):

ADR-025a (Committed - v0.13.x):

  • P0: Implement OllamaGateway with OpenAI-compatible API format (wrap LiteLLM) ✅ Completed 2025-12-23
  • P0: Add model identifier format ollama/model-nameCompleted 2025-12-23
  • P0: Implement quality degradation notices for local model usage ✅ Completed 2025-12-23
  • P0: Define Council Hardware Profiles (Minimum/Recommended/Professional/Enterprise) ✅ Completed 2025-12-23
  • P1: Implement event-based webhook system with HMAC authentication ✅ Completed 2025-12-23
  • P1: Implement SSE for real-time token streaming ✅ Completed 2025-12-23
  • P1: Document hardware requirements for fully local council ✅ Completed 2025-12-23
  • P2: Create n8n integration example/template

ADR-025b (Exploratory - RFC Phase):

  • P1: Enhance MCP Server capability (Council as callable tool by other agents)
  • P2: Add streaming API support
  • P2: Design "Jury Mode" verdict types (Binary, Constructive Dissent, Tie-Breaker)
  • P2: Release "Jury Mode" positioning materials and templates
  • P3: Document LiteLLM as alternative deployment path
  • P3: Prototype "agent jury" governance layer concept
  • P3: Investigate multi-council federation architecture

Supplementary Council Review: Configuration Alignment

Date: 2025-12-23 Status: APPROVED WITH MODIFICATIONS Council: Reasoning Tier (3/4 models: Claude Opus 4.5, Gemini-3-Pro, Grok-4) Issue: #81

Problem Statement

The initial ADR-025a implementation added Ollama and Webhook configuration to config.py (module-level constants) but did NOT integrate with unified_config.py (ADR-024's YAML-first configuration system). This created architectural inconsistency:

  • Users cannot configure Ollama/Webhooks via YAML
  • Configuration priority chain (YAML > ENV > Defaults) is broken for ADR-025a features
  • GatewayConfig.validate_gateway_name() rejects "ollama" as invalid

Council Decisions

QuestionDecisionRationale
Schema DesignConsolidate under gateways.providers.ollamaOllama is a gateway provider, not a top-level entity
DuplicationSingle location onlyReject top-level OllamaConfig to avoid split-brain configuration
Backwards CompatDeprecate config.py immediatelyUse __getattr__ bridge with DeprecationWarning
Webhook ScopePolicy in config, routing in runtimetimeout/retries in YAML; url/secret runtime-only (security)
Feature FlagsExplicit enabled flagsFollow ADR-020 pattern for clarity

Approved Schema

# unified_config.py additions

class OllamaProviderConfig(BaseModel):
"""Ollama provider config - lives inside gateways.providers."""
enabled: bool = True
base_url: str = Field(default="http://localhost:11434")
timeout_seconds: float = Field(default=120.0, ge=1.0, le=3600.0)
hardware_profile: Optional[Literal[
"minimum", "recommended", "professional", "enterprise"
]] = None

class WebhookConfig(BaseModel):
"""Webhook system config - top-level like ObservabilityConfig."""
enabled: bool = False # Opt-in
timeout_seconds: float = Field(default=5.0, ge=0.1, le=60.0)
max_retries: int = Field(default=3, ge=0, le=10)
https_only: bool = True
default_events: List[str] = Field(
default_factory=lambda: ["council.complete", "council.error"]
)

Approved YAML Structure

council:
gateways:
default: openrouter
fallback_chain: [openrouter, ollama]
providers:
ollama:
enabled: true
base_url: http://localhost:11434
timeout_seconds: 120.0
hardware_profile: recommended

webhooks:
enabled: false
timeout_seconds: 5.0
max_retries: 3
https_only: true
default_events:
- council.complete
- council.error

Implementation Tasks

  • Create GitHub issue #81 for tracking
  • Add OllamaProviderConfig to unified_config.py
  • Add WebhookConfig to unified_config.py
  • Update GatewayConfig validator to include "ollama"
  • Add env var overrides in _apply_env_overrides()
  • Add deprecation bridge to config.py with __getattr__
  • Update OllamaGateway to accept config object
  • Write TDD tests for new config models

Architectural Principles Reinforced

  1. Single Source of Truth: unified_config.py is the authoritative configuration layer (ADR-024)
  2. YAML-First: Environment variables are overrides, not primary configuration
  3. No Secrets in Config: Webhook url and secret remain runtime-only
  4. Explicit over Implicit: Feature flags use explicit enabled fields

Gap Remediation: EventBridge and Webhook Integration

Date: 2025-12-23 Status: COMPLETED Issue: #82, #83

Problem Statement

Peer review (LLM Antigravity) identified critical gaps in the ADR-025a implementation at 60% readiness:

GapSeverityStatusDescription
Gap 1: Webhook IntegrationCritical✅ FixedWebhookDispatcher existed but council.py never called it
Gap 2: SSE StreamingCritical✅ FixedReal implementation replaces placeholder
Gap 3: Event BridgeCritical✅ FixedNo bridge connected LayerEvents to WebhookDispatcher

Council-Approved Solution

The reasoning tier council approved the following approach:

DecisionChoiceRationale
Event Bridge DesignHybrid Pub/SubAsync queue for production, sync mode for testing
Webhook TriggeringHierarchical ConfigRequest-level > Global-level > Defaults
SSE Streaming ScopeStage-Level EventsMap-Reduce pattern prevents token streaming until Judge phase
Breaking ChangesBackward CompatibleOptional kwargs to existing functions

Implementation

1. EventBridge Class (src/llm_council/webhooks/event_bridge.py)

@dataclass
class EventBridge:
webhook_config: Optional[WebhookConfig] = None
mode: DispatchMode = DispatchMode.SYNC
request_id: Optional[str] = None

async def start(self) -> None
async def emit(self, event: LayerEvent) -> None
async def shutdown(self) -> None

2. Event Mapping (LayerEvent → WebhookPayload)

LayerEventTypeWebhookEventType
L3_COUNCIL_STARTcouncil.deliberation_start
L3_STAGE_COMPLETE (stage=1)council.stage1.complete
L3_STAGE_COMPLETE (stage=2)council.stage2.complete
L3_COUNCIL_COMPLETEcouncil.complete
L3_MODEL_TIMEOUTcouncil.error

3. Council Integration

async def run_council_with_fallback(
user_query: str,
...
*,
webhook_config: Optional[WebhookConfig] = None, # NEW
) -> Dict[str, Any]:
# EventBridge lifecycle
event_bridge = EventBridge(webhook_config=webhook_config)
try:
await event_bridge.start()
await event_bridge.emit(LayerEvent(L3_COUNCIL_START, ...))
# ... stages emit their events ...
await event_bridge.emit(LayerEvent(L3_COUNCIL_COMPLETE, ...))
finally:
await event_bridge.shutdown()

Test Coverage

  • 24 unit tests for EventBridge (tests/test_event_bridge.py)
  • 11 integration tests for webhook integration (tests/test_webhook_integration.py)
  • 931 total tests passing after integration

Files Modified

FileActionDescription
src/llm_council/webhooks/event_bridge.pyCREATEEventBridge class with async queue
src/llm_council/webhooks/__init__.pyMODIFYExport EventBridge, DispatchMode
src/llm_council/council.pyMODIFYAdd webhook_config param, emit events
tests/test_event_bridge.pyCREATEUnit tests for EventBridge
tests/test_webhook_integration.pyCREATEIntegration tests

GitHub Issues

  • #82: EventBridge implementation ✅
  • #83: Webhook integration ✅
  • #84: SSE streaming ✅

Phase 3: SSE Streaming (Completed)

SSE streaming implementation completed with:

  • Real _council_runner.py implementation using EventBridge
  • /v1/council/stream SSE endpoint in http_server.py
  • 18 TDD tests (14 pass, 4 skip when FastAPI not installed)
  • Stage-level events: deliberation_start → stage1.complete → stage2.complete → complete

Implementation Status Summary

FeatureStatusIssue
OllamaGateway✅ Complete-
Quality degradation notices✅ Complete-
Hardware profiles✅ Complete-
Webhook infrastructure✅ Complete-
EventBridge✅ Complete#82
Council webhook integration✅ Complete#83
SSE streaming✅ Complete#84
n8n integration example❌ Pending-

ADR-025a Readiness: 100% (was 60% before gap remediation)


ADR-025b Council Validation: Jury Mode Features

Date: 2025-12-23 Status: VALIDATED WITH SCOPE MODIFICATIONS Council: Reasoning Tier (4/4 models: Claude Opus 4.5, Gemini-3-Pro, GPT-4o, Grok-4) Consensus Level: High Primary Author: Claude Opus 4.5 (ranked #1)

Executive Summary

"The core value of ADR-025b is transforming the system from a 'Summary Generator' to a 'Decision Engine.' Prioritize features that enforce structured, programmatic outcomes (Binary Verdicts, MCP Schemas) and cut features that add architectural noise (Federation)."

Council Verdicts by Feature

Original FeatureOriginal PriorityCouncil VerdictNew Priority
MCP EnhancementP1DEPRIORITIZEP3
Streaming APIP2REMOVEN/A
Jury Mode Design (Binary)P2COMMITP1
Jury Mode Design (Tie-Breaker)P2COMMITP1
Jury Mode Design (Constructive Dissent)P2COMMIT (minimal)P2
Jury Mode MaterialsP2COMMITP2
LiteLLM DocumentationP3COMMITP2
Federation RFCP3REMOVEN/A

Key Architectural Findings

1. Streaming API: Architecturally Impossible (Unanimous)

Token-level streaming is fundamentally incompatible with the Map-Reduce deliberation pattern:

User Request

Stage 1: N models generate (parallel, 10-30s) → No tokens yet

Stage 2: N models review (parallel, 15-40s) → No tokens yet

Stage 3: Chairman synthesizes → Tokens ONLY HERE

Resolution: Existing SSE stage-level events (council.stage1.complete, etc.) are the honest representation. Do not implement token streaming.

2. Constructive Dissent: Option B (Extract from Stage 2)

The council evaluated four approaches:

OptionDescriptionVerdict
ASeparate synthesis from lowest-rankedREJECT (cherry-picking)
BExtract dissenting points from Stage 2ACCEPT
CAdditional synthesis passREJECT (latency cost)
DNot worth implementingREJECT (real demand)

Implementation: Extract outlier evaluations (score < median - 1 std) from existing Stage 2 data. Only surface when Borda spread > 2 points.

3. Federation: Removed for Scope Discipline

IssueImpact
Latency explosion3-9x (45-90 seconds per call)
Governance complexityDebugging nested decisions impossible
Scope creepDiverts from core positioning

4. MCP Enhancement: Existing Tools Sufficient

Current consult_council tool adequately supports agent-to-council delegation. Enhancement solves theoretical problem (context passing) better addressed via documentation.

Jury Mode Implementation

Binary Verdict Mode

Transforms chairman synthesis into structured decision output:

class VerdictType(Enum):
SYNTHESIS = "synthesis" # Default (current behavior)
BINARY = "binary" # approved/rejected
TIE_BREAKER = "tie_breaker" # Chairman decides on deadlock

@dataclass
class VerdictResult:
verdict_type: VerdictType
verdict: str # "approved"|"rejected"|synthesis
confidence: float # 0.0-1.0
rationale: str
dissent: Optional[str] = None # Minority opinion
deadlocked: bool = False
borda_spread: float = 0.0

Use Cases Enabled

Verdict TypeUse CaseOutput
BinaryCI/CD gates, policy enforcement, compliance checks{verdict: "approved"/"rejected", confidence, rationale}
Tie-BreakerDeadlocked decisions, edge casesChairman decision with explicit rationale
Constructive DissentArchitecture reviews, strategy decisionsMajority + minority opinion

Revised ADR-025b Action Items

Committed (v0.14.x):

  • P1: Implement Binary verdict mode (VerdictType enum, VerdictResult dataclass) ✅ Completed 2025-12-23
  • P1: Implement Tie-Breaker mode with deadlock detection ✅ Completed 2025-12-23
  • P2: Implement Constructive Dissent extraction from Stage 2 ✅ Completed 2025-12-23
  • P2: Create Jury Mode positioning materials and examples ✅ README updated 2025-12-23
  • P2: Document LiteLLM as alternative deployment path

Exploratory (RFC):

  • P3: Document MCP context-rich invocation patterns

Removed:

  • P2: Streaming API (architecturally impossible)
  • P3: Federation RFC (scope creep)

Architectural Principles Established

  1. Decision Engine > Summary Generator: Jury Mode enforces structured outputs
  2. Honest Representation: Stage-level events reflect true system state
  3. Minimal Complexity: Extract dissent from existing data, don't generate
  4. Scope Discipline: Remove features that add noise without value
  5. Backward Compatibility: Verdict typing is opt-in, synthesis remains default

Council Evidence

ModelLatencyKey Contribution
Claude Opus 4.558.5s3-analyst deliberation framework, Option B recommendation
Gemini-3-Pro31.2s"Decision Engineering" framing, verdict schema
Grok-474.8sScope assessment, MCP demotion validation
GPT-4o18.0sSelective feature promotion

Consensus Points:

  1. Streaming is architecturally impossible (4/4)
  2. Federation is scope creep (4/4)
  3. Binary + Tie-Breaker are high-value/low-effort (4/4)
  4. MCP enhancement is overprioritized (3/4)
  5. Constructive Dissent via extraction is correct approach (3/4)

ADR-025b Implementation Status

Date: 2025-12-23 Status: COMPLETE (Core Features) Tests: 1021 passing

Files Created/Modified

FileActionDescription
src/llm_council/verdict.pyCREATEVerdictType enum, VerdictResult dataclass
src/llm_council/dissent.pyCREATEConstructive Dissent extraction from Stage 2
src/llm_council/council.pyMODIFYverdict_type, include_dissent parameters
src/llm_council/mcp_server.pyMODIFYverdict_type, include_dissent in consult_council
src/llm_council/http_server.pyMODIFYverdict_type, include_dissent in CouncilRequest
tests/test_verdict.pyCREATETDD tests for verdict functionality
tests/test_dissent.pyCREATETDD tests for dissent extraction
README.mdMODIFYJury Mode section added

Feature Summary

FeatureStatusGitHub Issue
Binary Verdict Mode✅ Complete#85 (closed)
Tie-Breaker Mode✅ Complete#86 (closed)
Constructive Dissent✅ Complete#87 (closed)
Jury Mode Documentation✅ Complete#88 (closed)

API Changes

New Parameters:

  • verdict_type: "synthesis" | "binary" | "tie_breaker"
  • include_dissent: boolean (extract minority opinions from Stage 2)

New Response Fields:

  • metadata.verdict: VerdictResult object (when verdict_type != synthesis)
  • metadata.dissent: string (when include_dissent=True in synthesis mode)

Test Coverage

  • verdict.py: 8 tests
  • dissent.py: 15 tests
  • Total test suite: 1021 tests passing

ADR-025 Overall Status

PhaseScopeStatus
ADR-025aLocal LLM + Webhooks + SSE✅ Complete (100%)
ADR-025bJury Mode Features✅ Complete (Core Features)

Remaining Work:

  • P2: Document LiteLLM as alternative deployment path
  • P3: Document MCP context-rich invocation patterns