Skip to main content

ADR-027: Frontier Tier

Status: ACCEPTED (Revised per Council Review 2025-12-24) Date: 2025-12-24 Decision Makers: Engineering, Architecture Extends: ADR-022 (Tier System) Council Review: Reasoning tier (gpt-5.2-pro, claude-opus-4.5, gemini-3-pro-preview, grok-4.1-fast)


Context

The current tier system (ADR-022) defines four confidence tiers:

  • quick - Fast responses, low latency priority
  • balanced - General use, balanced priorities
  • high - Quality deliberation, proven stable models
  • reasoning - Deep analysis with extended thinking

Gap Identified: There is no tier for evaluating cutting-edge, preview, or beta models before they are promoted to production use in high tier.

Problem: New models (e.g., GPT-5.2-pro, Gemini 3 Pro Preview) cannot be safely tested in council deliberations without risking production stability. The high tier explicitly requires proven stable models (30+ days), creating a chicken-and-egg problem.


Decision

Introduce a new confidence tier called frontier for cutting-edge/preview model evaluation.

Tier Definition

Attributehighfrontier
PurposeProduction deliberationMax-capability evaluation
StabilityProven (30+ days)New/beta accepted
Preview modelsProhibitedAllowed
Rate limitsStandardMay be restricted
PricingKnown/stableMay fluctuate
Risk toleranceLowHigh
Voting AuthorityFullAdvisory only (Shadow Mode)

Shadow Mode (Council Recommendation)

Critical Design Decision: Frontier models operate in Shadow Mode by default.

class VotingAuthority(Enum):
FULL = "full" # Vote counts in consensus
ADVISORY = "advisory" # Logged/evaluated, vote weight = 0.0
EXCLUDED = "excluded" # Not included in deliberation

# Default voting authority by tier
TIER_VOTING_AUTHORITY = {
"quick": VotingAuthority.FULL,
"balanced": VotingAuthority.FULL,
"high": VotingAuthority.FULL,
"reasoning": VotingAuthority.FULL,
"frontier": VotingAuthority.ADVISORY, # Shadow mode by default
}

Rationale: An experimental, hallucinating model could break a tie or poison the context of a production workflow. Shadow Mode ensures frontier models can be evaluated without affecting council decisions.

Override: Operators may explicitly enable full voting for frontier models via configuration:

council:
tiers:
frontier:
voting_authority: full # Override shadow mode

Tier Intersection: Reasoning vs Frontier

Conflict Resolution: Models can belong to multiple conceptual categories (e.g., o1-preview is both "reasoning" and "frontier").

Precedence Rule:

  1. If user requests frontier, reasoning models ARE included (frontier is capability-focused)
  2. If user requests reasoning, preview/beta models ARE excluded unless allow_preview: true
  3. frontier acts as an override flag that permits preview models within other tier requests
def resolve_tier_intersection(
requested_tier: str,
model_info: ModelInfo,
allow_preview: bool = False
) -> bool:
"""Determine if model qualifies for requested tier."""
if requested_tier == "frontier":
# Frontier accepts all capable models including previews
return model_info.quality_tier == QualityTier.FRONTIER

if requested_tier == "reasoning":
# Reasoning excludes previews by default
if model_info.is_preview and not allow_preview:
return False
return model_info.supports_reasoning

# Other tiers: standard logic
return _standard_tier_qualification(requested_tier, model_info)

Tier Weights (Revised per Council)

TIER_WEIGHTS = {
# ... existing tiers ...
"frontier": {
"quality": 0.85, # INCREASED: Intelligence is the primary driver
"diversity": 0.05, # DECREASED: Don't rotate for rotation's sake
"availability": 0.05, # DECREASED: Accept instability in beta
"latency": 0.00, # Irrelevant for capability testing
"cost": 0.05, # Minor guardrail against extreme pricing
},
}

Rationale (Council Feedback):

  • Quality 85%: When testing the frontier, you want the absolute smartest model available
  • Diversity 5%: You often want to test one specific breakthrough model, not load-balance
  • Availability 5%: Preview APIs often have aggressive rate limits or outages
  • Latency 0%: Willing to wait for cutting-edge responses
  • Cost 5%: Minor guardrail to prevent extreme cost surprises

Graduation Criteria: Frontier → High

Council Requirement: Explicit metrics for model promotion.

@dataclass
class GraduationCriteria:
"""Criteria for promoting model from frontier to high tier."""
min_age_days: int = 30
min_completed_sessions: int = 100
max_error_rate: float = 0.02 # < 2% errors
min_quality_percentile: float = 0.75 # >= 75th percentile vs high-tier baseline
api_stability: bool = True # No breaking changes in evaluation period
provider_ga_status: bool = True # Provider removed "preview/beta" label

def should_graduate(
model_id: str,
tracker: PerformanceTracker,
criteria: GraduationCriteria
) -> Tuple[bool, List[str]]:
"""Check if model meets graduation criteria."""
stats = tracker.get_model_stats(model_id)
failures = []

if stats.days_tracked < criteria.min_age_days:
failures.append(f"age: {stats.days_tracked} < {criteria.min_age_days} days")

if stats.completed_sessions < criteria.min_completed_sessions:
failures.append(f"sessions: {stats.completed_sessions} < {criteria.min_completed_sessions}")

if stats.error_rate > criteria.max_error_rate:
failures.append(f"error_rate: {stats.error_rate:.1%} > {criteria.max_error_rate:.1%}")

if stats.quality_percentile < criteria.min_quality_percentile:
failures.append(f"quality: {stats.quality_percentile:.0%} < {criteria.min_quality_percentile:.0%}")

return (len(failures) == 0, failures)

Cost Ceiling Protection

Council Requirement: Prevent runaway costs from volatile preview pricing.

def apply_cost_ceiling(
model_id: str,
model_cost: float,
tier: str,
high_tier_avg_cost: float
) -> Tuple[bool, Optional[str]]:
"""Check if model cost exceeds tier ceiling."""
if tier != "frontier":
return (True, None)

# Frontier allows up to 5x high-tier average
FRONTIER_COST_MULTIPLIER = 5.0
ceiling = high_tier_avg_cost * FRONTIER_COST_MULTIPLIER

if model_cost > ceiling:
return (False, f"cost ${model_cost:.4f} exceeds ceiling ${ceiling:.4f}")

return (True, None)

Hard Fallback

Council Requirement: Define behavior when frontier model fails.

async def execute_with_fallback(
query: str,
frontier_model: str,
fallback_tier: str = "high"
) -> ModelResponse:
"""Execute frontier model with automatic fallback."""
try:
response = await query_model(frontier_model, query, timeout=300)
return response
except (RateLimitError, TimeoutError, APIError) as e:
logger.warning(f"Frontier model {frontier_model} failed: {e}. Falling back to {fallback_tier}")

# Automatic degradation to high tier
fallback_models = get_tier_models(fallback_tier)
return await query_model(fallback_models[0], query)

Privacy & Compliance Warning

Council Requirement: Document data handling differences for preview models.

**Privacy Notice:** Preview and beta models may have different data retention
policies than production models. Providers often use beta API inputs for
model training.

**Requirement:** PII must be scrubbed before sending prompts to frontier tier
unless the operator has verified the provider's data handling policy.

Static Pool (Fallback)

DEFAULT_TIER_MODEL_POOLS = {
# ... existing tiers ...
"frontier": [
"openai/gpt-5.2-pro",
"anthropic/claude-opus-4.5",
"google/gemini-3-pro-preview",
"x-ai/grok-4",
"deepseek/deepseek-r1",
],
}

Configuration

council:
tiers:
pools:
frontier:
models:
- openai/gpt-5.2-pro
- anthropic/claude-opus-4.5
- google/gemini-3-pro-preview
timeout_seconds: 300
allow_preview: true
allow_beta: true
voting_authority: advisory # Shadow mode default
cost_ceiling_multiplier: 5.0
fallback_tier: high

graduation:
min_age_days: 30
min_completed_sessions: 100
max_error_rate: 0.02
min_quality_percentile: 0.75

Consequences

Positive

  • Safe environment for evaluating new models before production use
  • Clear promotion path: frontier → high with explicit criteria
  • Enables early adoption of cutting-edge capabilities
  • Separates experimentation from production
  • Shadow Mode protects council consensus from experimental failures

Negative

  • Additional tier to maintain
  • Frontier results may be less reliable
  • Users must understand tier semantics
  • Shadow Mode means frontier responses don't influence final decisions

Risks & Mitigations

RiskMitigation
Hallucinating model poisons consensusShadow Mode (advisory only)
Cost overruns from volatile pricingCost ceiling (5x high-tier avg)
Preview model deprecation mid-evaluationHard fallback to high tier
Data privacy with beta APIsPII scrubbing requirement
Reasoning/frontier tier confusionExplicit precedence rules

Implementation

Files to Modify

  1. src/llm_council/config.py - Add frontier to DEFAULT_TIER_MODEL_POOLS
  2. src/llm_council/metadata/selection.py - Add frontier to TIER_WEIGHTS (revised values)
  3. src/llm_council/tier_contract.py - Support frontier tier contracts
  4. src/llm_council/council.py - Implement Shadow Mode voting authority
  5. src/llm_council/metadata/intersection.py - NEW: Tier intersection logic
  6. src/llm_council/metadata/types.py - Add is_preview, supports_reasoning fields
  7. src/llm_council/frontier_fallback.py - Add event emission for fallbacks

Validation

  • Tests for select_tier_models(tier="frontier")
  • Tests for frontier tier weights
  • Tests for frontier tier contract creation
  • Tests for Shadow Mode voting (Issue #110, #111)
  • Tests for graduation criteria (Issue #112)
  • Tests for cost ceiling (Issue #113)
  • Tests for hard fallback (Issue #114)
  • Document frontier tier in CLAUDE.md

Gap Remediation (Peer Review 2025-12-24)

  • Tier intersection logic (Issue #119) - resolve_tier_intersection() in metadata/intersection.py
  • Shadow votes integration (Issue #117) - Wired into run_council_with_fallback, events emitted
  • Fallback wrapper integration (Issue #118) - Event emission in execute_with_fallback_detailed

Observability

# Metrics to emit
frontier.model.selected{model_id}
frontier.model.shadow_vote{model_id, agreed_with_consensus}
frontier.model.fallback_triggered{model_id, reason}
frontier.model.cost_ceiling_exceeded{model_id}
frontier.graduation.candidate{model_id}
frontier.graduation.promoted{model_id}

References