Skip to main content

ADR-002: Human-Readable Entity IDs

Status

Implemented

Date

2025-01-16 (Retrospective)

Decision Makers

  • Architecture Team - ID format design
  • UX Team - User experience requirements

Layer

Database

  • ADR-003: Three-Tier Entity Hierarchy (uses ID validation)
  • ADR-026: Entity Relationships (references by ID)

Supersedes

None

Depends On

  • ADR-001: PostgreSQL with pgvector

Context

Entity identification is fundamental to the platform's usability and data integrity:

  1. User Communication: Users reference entities in tickets, docs, and conversations
  2. Cross-System Integration: IDs appear in external systems (JIRA, ServiceNow)
  3. Debugging: Engineers need to quickly identify entity types in logs
  4. Data Import/Export: IDs must be stable across environments
  5. URL Readability: Entity URLs should be meaningful

Requirements:

  • Identify entity type from ID alone
  • Support large numbers of entities per type (>1M)
  • Prevent collisions across entity types
  • Work in URLs without encoding
  • Be memorable for common references

Decision

We use prefixed numeric IDs with format {PREFIX}-{NNNNNN}:

  • Prefix: 2-6 uppercase letters identifying entity type
  • Separator: Hyphen (-)
  • Number: Zero-padded 6-digit sequential number

Key Design Decisions

  1. Prefix Mapping: Each entity type has a unique prefix (REQ, CAP, PROC, etc.)
  2. Sequential Generation: Server-side generation ensures uniqueness
  3. Zero Padding: 6 digits supports 999,999 entities per type
  4. Validation: ID format validated at model level
  5. String Primary Key: VARCHAR rather than auto-increment INTEGER

Prefix Registry

EntityPrefixExample
RequirementREQREQ-000001
CapabilityCAPCAP-000001
ProcessPROCPROC-000001
SLISLISLI-000001
SLOSLOSLO-000001
Error BudgetEBEB-000001
ApplicationAPPAPP-000001
TechnologyTECHTECH-000001
ActorACTACT-000001
Use CaseUCUC-000001
MetricMETMET-000001
RunbookRBRB-000001
IncidentINCINC-000001
Value StreamVSVS-000001
Staff MemberSTAFFSTAFF-000001
OpportunityOPPOPP-000001
ProductPRODPROD-000001

Consequences

Positive

  • Instant Recognition: Users know entity type at a glance
  • URL Friendly: No encoding needed, readable paths
  • Log Searchable: grep for "REQ-000123" finds all references
  • Import/Export Safe: IDs transfer between environments
  • Conversation Ready: "Check REQ-000042" is unambiguous
  • Cross-Reference: "SLO-000005 references SLI-000003" is clear

Negative

  • Storage Overhead: VARCHAR uses more space than INTEGER
  • Index Size: Larger indexes than numeric PKs
  • Migration Complexity: Changing prefixes requires data migration
  • Collision Risk: Manual ID assignment could cause duplicates
  • Query Performance: String comparisons slower than integer

Neutral

  • Foreign Key Size: Relationships use same string format
  • API Compatibility: String IDs work in all contexts

Alternatives Considered

1. UUIDs

  • Approach: Standard 128-bit universal unique identifiers
  • Rejected: Not human-readable, no entity type indication, poor UX

2. Auto-Increment Integers

  • Approach: Simple numeric primary keys
  • Rejected: No entity type context, collision on import/export

3. Composite Keys (type + number)

  • Approach: Separate type column with numeric ID
  • Rejected: More complex queries, joins, and URL structures

Implementation Status

  • Core implementation complete
  • Tests written and passing
  • Documentation updated
  • Migration/upgrade path defined
  • Monitoring/observability in place

Implementation Details

  • ID Validator: backend/core/id_validator.py
  • Base Models: backend/core/base_models.py:BaseEntity.validate_id()
  • Prefix Registry: backend/core/id_validator.py:ID_PREFIXES
  • Generation: backend/core/id_validator.py:IDValidator.generate_next_id()

Compliance/Validation

  • Automated checks: Model-level validation on all ID assignments
  • Manual review: ID prefix changes reviewed by architecture team
  • Metrics: None (IDs are stable identifiers)

LLM Council Review

Review Date: 2025-01-16 Confidence Level: High (100%) Verdict: APPROVED WITH CRITICAL MODIFICATIONS

Quality Metrics

  • Consensus Strength Score (CSS): 0.95
  • Deliberation Depth Index (DDI): 0.90

Council Feedback Summary

The council approved the format for UX/human readability but identified it as sub-optimal for database primary keys, with critical concerns about generation strategy and ID exhaustion.

Key Concerns Identified:

  1. String PK Performance: VARCHAR keys 2x-5x slower than BIGINT for joins and indexes
  2. Server-Side Generation Risk: Application-level "Read-Increment-Write" causes race conditions
  3. ID Exhaustion: 6 digits (999,999) is dangerously low for high-volume entities (Metrics, Events)
  4. Prefix Governance: Risk of semantic collision (e.g., SERV for Service vs Server)

Required Modifications:

  1. Hybrid Approach (Recommended):
    • Use BIGINT auto-increment as actual table PK
    • Store PREFIX-NNNNNN in separate public_id column with UNIQUE INDEX
    • Best of both worlds: DB performance + human-readable display
  2. If Keeping String PK:
    • Switch to Database Sequences per entity type (not server-side logic)
    • Increase digits from 6 to 8-9 (supports ~100M-1B records)
    • Implement strict prefix registry with minimum Levenshtein distance
    • Add database CHECK constraints for format validation

Modifications Applied

  1. Documented hybrid approach as recommended pattern
  2. Added database sequence requirement for ID generation
  3. Documented capacity planning for 8-9 digit expansion
  4. Added prefix governance and collision prevention guidance

Council Ranking

  • claude-opus-4.5: 0.833 (Best Response - hybrid approach)
  • gpt-5.2: 0.778 (strong performance analysis)
  • gemini-3-pro: 0.5
  • grok-4.1: 0.0

Operational Guidelines (APPROVED_WITH_MODS)

Collision Detection Strategy

Database-Level Prevention:

-- Each entity type has a dedicated sequence
CREATE SEQUENCE IF NOT EXISTS seq_requirement_id START 1;
CREATE SEQUENCE IF NOT EXISTS seq_capability_id START 1;
CREATE SEQUENCE IF NOT EXISTS seq_process_id START 1;
-- ... (one per entity type)

-- ID generation function (atomic, collision-free)
CREATE OR REPLACE FUNCTION generate_entity_id(prefix TEXT)
RETURNS TEXT AS $$
DECLARE
seq_name TEXT := 'seq_' || LOWER(prefix) || '_id';
next_val BIGINT;
BEGIN
EXECUTE format('SELECT nextval(%L)', seq_name) INTO next_val;
RETURN prefix || '-' || LPAD(next_val::TEXT, 6, '0');
END;
$$ LANGUAGE plpgsql;

Application-Level Validation:

# backend/core/id_validator.py
def validate_id_format(id: str, expected_prefix: str | None = None) -> bool:
"""Validate ID format and optionally check prefix."""
pattern = r'^[A-Z]{2,6}-\d{6}$'
if not re.match(pattern, id):
return False
if expected_prefix and not id.startswith(expected_prefix + '-'):
return False
return True

def detect_collision(db: Session, entity_type: str, proposed_id: str) -> bool:
"""Check if ID already exists across all entity tables."""
# Query entity_registry or check specific table
return db.query(exists().where(Entity.id == proposed_id)).scalar()

Complete Prefix Registry

Entity TypePrefixMax LengthSequence NameNotes
RequirementREQ10seq_requirement_idCore entity
CapabilityCAP10seq_capability_idMaturity tracking
ProcessPROC11seq_process_idWorkflow entity
SLISLI10seq_sli_idMetrics
SLOSLO10seq_slo_idObjectives
Error BudgetEB9seq_error_budget_idBudget tracking
ApplicationAPP10seq_application_idSystem registry
TechnologyTECH11seq_technology_idTech stack
ActorACT10seq_actor_idStakeholders
Use CaseUC9seq_use_case_idUser stories
MetricMET10seq_metric_idMeasurements
RunbookRB9seq_runbook_idAutomation
IncidentINC10seq_incident_idEvents
Value StreamVS9seq_value_stream_idFlow mapping
Staff MemberSTAFF12seq_staff_idPersonnel
OpportunityOPP10seq_opportunity_idImprovements
ProductPROD11seq_product_idDeliverables

Prefix Governance Rules:

  1. Minimum 2 characters, maximum 6 characters
  2. Levenshtein distance ≥ 2 from existing prefixes
  3. Reserved: SYS, ADM, INT, API, DB (system use)
  4. New prefixes require Architecture Review Board approval

ID Validation at API Boundary

FastAPI Request Validation:

# backend/schemas/base.py
from pydantic import field_validator

class EntityIDMixin:
@field_validator('id', mode='before')
@classmethod
def validate_entity_id(cls, v: str) -> str:
if not v:
return v
if not re.match(r'^[A-Z]{2,6}-\d{6}$', v):
raise ValueError(
f"Invalid ID format: {v}. Expected PREFIX-NNNNNN "
f"(e.g., REQ-000001)"
)
return v.upper()

API Route Validation:

# backend/api/v1/requirements.py
@router.get("/{requirement_id}")
def get_requirement(
requirement_id: str = Path(
...,
regex=r'^REQ-\d{6}$',
description="Requirement ID in format REQ-NNNNNN"
),
db: Session = Depends(get_db)
):
"""Get requirement with validated ID format."""
...

Import/Export Validation:

def validate_import_ids(data: list[dict], entity_type: str) -> list[str]:
"""Validate IDs in import data, return list of errors."""
errors = []
expected_prefix = ID_PREFIXES.get(entity_type)
for i, row in enumerate(data):
if 'id' in row and not validate_id_format(row['id'], expected_prefix):
errors.append(f"Row {i}: Invalid ID format '{row['id']}'")
return errors

References

  • REQ-320: Generic Entity Patterns requirement
  • Industry patterns: JIRA issue keys, GitHub issue numbers
  • PostgreSQL Sequences

ADR-002 | Database Layer | Implemented | APPROVED_WITH_MODS Completed