ADR-037: AI Categorization
Status
Implemented
Date
2025-01-16 (Retrospective)
Decision Makers
- AI/ML Team - Classification approach
- Product Team - Categorization requirements
Layer
AI-ML
Related ADRs
- ADR-017: Dual Embedding Strategy
- ADR-018: Semantic Search with pgvector
Supersedes
None
Depends On
- ADR-017: Dual Embedding Strategy
Context
Manual categorization is tedious and inconsistent:
- Volume: Hundreds of requirements to categorize
- Consistency: Different users categorize differently
- Speed: Manual review is time-consuming
- Accuracy: Users may miss appropriate categories
- Evolution: Categories change over time
Requirements:
- Auto-suggest categories based on content
- Confidence scores for suggestions
- Human override capability
- Support for requirements, runbooks, incidents
- Batch processing for existing entities
Decision
We implement embedding-based AI categorization:
Key Design Decisions
- Embedding Similarity: Compare to category exemplars
- Confidence Scoring: 0-1 confidence for each suggestion
- Multi-Label: Multiple categories per entity
- Human Override: Final decision with user
- Feedback Loop: Corrections improve suggestions
Categorization Algorithm
async def suggest_categories(
text: str,
entity_type: str,
top_k: int = 3
) -> list[CategorySuggestion]:
"""Suggest categories based on content similarity."""
# Get embedding for input text
embedding = await generate_embedding(text)
# Get category exemplars
categories = get_categories_for_entity(entity_type)
suggestions = []
for category in categories:
# Compare to category exemplar embeddings
similarity = cosine_similarity(embedding, category.exemplar_embedding)
if similarity > 0.5: # Threshold
suggestions.append(CategorySuggestion(
category=category.name,
confidence=similarity,
reasoning=f"Similar to {category.example_title}"
))
return sorted(suggestions, key=lambda x: x.confidence, reverse=True)[:top_k]
API Endpoint
POST /api/v1/requirements/categorize
{
"text": "Implement OAuth2 login with MFA support"
}
Response:
{
"suggestions": [
{"category": "Security", "confidence": 0.89},
{"category": "Authentication", "confidence": 0.85},
{"category": "User Management", "confidence": 0.72}
]
}
Frontend Integration
// Auto-categorize on blur
<TextField
label="Description"
onBlur={async (e) => {
if (e.target.value.length > 50) {
const suggestions = await categorizationService.suggest(e.target.value);
if (suggestions[0]?.confidence > 0.8) {
setCategory(suggestions[0].category);
}
}
}}
/>
Consequences
Positive
- Consistency: Same logic for all categorization
- Speed: Instant suggestions
- Accuracy: Embedding-based matching is semantic
- Learning: Feedback improves over time
- User Experience: Reduces cognitive load
Negative
- Cold Start: Needs exemplars for new categories
- Confidence Calibration: Scores may not reflect reality
- Edge Cases: Novel content may not match well
- Embedding Cost: Requires embedding generation
Neutral
- Human-in-Loop: Still requires user confirmation
- Batch Processing: Can process existing data
Implementation Status
- Core implementation complete
- Tests written and passing
- Documentation updated
- Migration/upgrade path defined
- Monitoring/observability in place
Implementation Details
- Service:
backend/services/ai_categorization.py - API:
backend/api/v1/categorization.py - Frontend:
frontend/src/components/CategorySuggestions.tsx - Exemplars:
backend/data/category_exemplars.json
LLM Council Review
Review Date: 2025-01-16 Confidence Level: High (100%) Verdict: CONDITIONAL APPROVAL - MAJOR MODIFICATIONS
Quality Metrics
- Consensus Strength Score (CSS): 0.88
- Deliberation Depth Index (DDI): 0.90
Council Feedback Summary
Embedding-based categorization is the correct architectural direction for handling semantic variation. However, the single-exemplar approach is operationally brittle and the confidence scoring is miscalibrated.
Key Concerns Identified:
- Single Exemplar Fragile: One sentence can't represent category diversity (e.g., "Database" = timeout OR syntax error OR disk pressure)
- Linear Scan O(N): For loop over categories won't scale; needs vector index
- Confidence ≠ Similarity: Raw cosine similarity is not a probability and varies by model
- No Cold Start Strategy: New categories with no history have no exemplars
- Hybrid Search Missing: Embeddings fail on specific error codes/host IDs
Required Modifications:
- Centroid-Based Classification: Average embedding of multiple verified examples per category (not single exemplar)
- Vector Index: Use ANN/HNSW (Faiss, pgvector) instead of linear scan
- Confidence Calibration: Use Isotonic Regression or Platt Scaling to map similarity → probability
- Monitor Expected Calibration Error (ECE)
- Use dynamic thresholding (gap between top 1 and top 2)
- Hybrid Fallback: Combine vector search with keyword/BM25 for cold start and specific IDs
- Feedback Loop Implementation:
- Store:
{input_text, suggested_category, user_selection} - Update: Moving average to category centroid
- Safety: Drift detection to prevent concept drift
- Store:
- Entity Separation: Partition embedding space by entity type (Alerts vs Tickets vs Runbooks)
Modifications Applied
- Documented centroid-based classification
- Added vector index requirement (ANN/HNSW)
- Documented confidence calibration approach
- Added hybrid search fallback strategy
- Documented feedback loop mechanism
Council Ranking
- gpt-5.2: Best Response (calibration)
- gemini-3-pro: Strong (hybrid search)
- claude-opus-4.5: Good (feedback loop)
References
- Text Classification with Embeddings
docs/development/ai-ml-features-guide.md
ADR-037 | AI-ML Layer | Implemented