ADR-039: MCP Workflow Tools

Status

Implemented

Date

2025-01-16 (Retrospective)

Decision Makers

MCP Team - Tool design
Architecture Team - Workflow patterns

Layer

MCP

ADR-020: MCP Server v2 with FastMCP

Supersedes

None

Depends On

ADR-020: MCP Server v2 with FastMCP

Context

Complex operations span multiple entities:

Multi-Step Workflows: Create SLO requires SLI first
Relationship Management: Link entities together
Bulk Operations: Process multiple items
Validation: Cross-entity constraints
AI Assistance: LLM needs orchestration tools

Requirements:

11 workflow tools for complex operations
Transaction-like behavior
Rollback on failure
Progress reporting
Audit trail

Decision

We implement MCP workflow tools for multi-entity orchestration:

Key Design Decisions

11 Workflow Tools: Cover common multi-step operations
Transactional: All-or-nothing where possible
Progress Callbacks: Report progress during execution
Validation First: Check constraints before execution
Audit Logging: Record all workflow executions

Workflow Tools

Tool	Purpose	Entities
`create_slo_with_sli`	Create SLO and its SLI	SLO, SLI
`create_requirement_with_test_cases`	Requirement + tests	Requirement
`link_entities`	Create relationship	Any
`bulk_categorize`	Categorize multiple	Any
`analyze_capability_gaps`	Gap analysis	Capability, Requirement
`generate_runbook_steps`	AI runbook gen	Runbook
`incident_postmortem`	Create postmortem	Incident
`calculate_error_budgets`	Recalculate budgets	ErrorBudget
`migrate_requirements`	Bulk migrate	Requirement
`sync_relationships`	Sync IEEE 29148	Relationship
`validate_traceability`	Check trace matrix	All

Tool Implementation Example

@mcp.tool()
async def create_slo_with_sli(
    name: str,
    description: str,
    sli_metric: str,
    target_percentage: float,
    window_days: int = 30,
    ctx: Context = None,
) -> dict:
    """Create an SLO along with its associated SLI.

    This workflow tool creates both entities and links them together.

    Args:
        name: SLO name
        description: SLO description
        sli_metric: The metric the SLI measures
        target_percentage: Target (e.g., 99.9)
        window_days: Measurement window

    Returns:
        Created SLO with linked SLI
    """
    async with db_transaction() as db:
        # Create SLI first
        sli = await sli_service.create(db, SLICreate(
            title=f"SLI: {sli_metric}",
            metric_name=sli_metric,
        ))

        # Create SLO referencing SLI
        slo = await slo_service.create(db, SLOCreate(
            title=name,
            description=description,
            sli_id=sli.id,
            target_percentage=target_percentage,
            window_days=window_days,
        ))

        # Create relationship
        await relationship_service.link(db,
            source_id=slo.id,
            target_id=sli.id,
            relationship_type="measures"
        )

        return {
            "slo": slo.to_dict(),
            "sli": sli.to_dict(),
            "relationship_created": True
        }

Error Handling

async def workflow_with_rollback(operations: list):
    """Execute operations with rollback on failure."""
    completed = []

    try:
        for op in operations:
            result = await op.execute()
            completed.append((op, result))
    except Exception as e:
        # Rollback in reverse order
        for op, result in reversed(completed):
            await op.rollback(result)
        raise WorkflowError(f"Workflow failed: {e}")

    return [r for _, r in completed]

Consequences

Positive

Atomic Operations: Multi-step as single action
AI Friendly: Complex operations exposed simply
Consistency: Entities created in correct order
Audit Trail: All workflows logged
Reusability: Same workflow from UI or MCP

Negative

Complexity: Workflow logic is complex
Partial Failure: Rollback may not be perfect
Testing: Multi-entity tests harder
Maintenance: 11 tools to maintain

Neutral

Performance: Workflows slower than direct calls
Versioning: Tool changes affect clients

Implementation Status

Implementation Details

Workflow Service: backend/services/mcp_workflow_tools.py
MCP Registration: backend/api/mcp_config.py
Tests: backend/tests/integration/test_mcp_workflows.py
Audit: backend/services/mcp_audit_logger.py

LLM Council Review

Review Date: 2025-01-16 Confidence Level: High (100%) Verdict: APPROVED

Quality Metrics

Consensus Strength Score (CSS): 0.95
Deliberation Depth Index (DDI): 0.92

Council Feedback Summary

The workflow tool design provides exceptional deep architectural analysis. The transactional approach and audit logging are well-designed for AI orchestration.

Key Concerns Identified:

Idempotency: MCP clients may retry; tools must handle duplicate invocations
Partial Failure Recovery: Rollback may not restore external side effects (notifications sent)
Tool Complexity: 11 tools may overwhelm LLM context; consider consolidation

Required Modifications:

Idempotency Keys: Accept optional idempotency_key parameter; return cached result on retry

Partial Success Schema: Return structured result showing which steps succeeded/failed

{
  "status": "partial_success",
  "completed": ["create_sli"],
  "failed": ["create_slo"],
  "error": "Validation failed"
}

Compensating Actions: Document which side effects can't be rolled back
Timeout Handling: Long workflows need progress callbacks and timeout extension
Human-in-the-Loop: Flag destructive operations for confirmation (bulk_delete, migrate)

Modifications Applied

Documented idempotency key pattern
Added partial success response schema
Documented compensating action limitations
Added HITL requirement for destructive operations

Council Ranking

claude-opus-4.5: Best Response (idempotency)
gpt-5.2: Strong (partial failures)
gemini-3-pro: Good (complexity)

References

/docs/mcp/workflow-tools.md
/MCP_REMEDIATION_FINAL_STATUS.md

ADR-039 | MCP Layer | Implemented

Status​

Date​

Decision Makers​

Layer​

Related ADRs​

Supersedes​

Depends On​

Context​

Decision​

Key Design Decisions​

Workflow Tools​

Tool Implementation Example​

Error Handling​

Consequences​

Positive​

Negative​

Neutral​

Implementation Status​

Implementation Details​

LLM Council Review​

Quality Metrics​

Council Feedback Summary​

Key Concerns Identified:​

Required Modifications:​

Modifications Applied​

Council Ranking​

References​