Skip to main content

Entity Relationship Primary Key Fix: Preventing Silent Data Corruption

· 3 min read

Published: 2025-01-16


When the LLM Council reviewed our entity relationships implementation (ADR-026), they flagged a critical flaw: our 4-column primary key blocked multiple relationship types between the same entities, and duplicate relationships could silently corrupt traceability data.

This post details how Issue #458 fixed both data integrity gaps with a proper 5-column primary key.

The Problem

Our EntityRelationship table connects any two entities with typed relationships:

-- Original schema (simplified)
CREATE TABLE entity_relationships (
source_type VARCHAR(50),
source_id VARCHAR(255),
target_type VARCHAR(50),
target_id VARCHAR(255),
relationship_type VARCHAR(50),
PRIMARY KEY (source_type, source_id, target_type, target_id)
-- Notice: relationship_type NOT in primary key!
);

The 4-column primary key created two problems:

Problem 1: Blocked Multiple Relationship Types

-- Insert #1: REQ-001 depends on REQ-002
INSERT INTO entity_relationships
VALUES ('requirement', 'REQ-001', 'requirement', 'REQ-002', 'depends_on');
-- SUCCESS!

-- Insert #2: REQ-001 also references REQ-002
INSERT INTO entity_relationships
VALUES ('requirement', 'REQ-001', 'requirement', 'REQ-002', 'references');
-- FAILS! PK violation (same 4-column key)

This was fundamentally broken—entities couldn't have multiple relationship types!

Problem 2: Potential Duplicate Relationships

Without proper constraints, race conditions or UI bugs could create duplicate relationships.

Consequences

  1. Feature Blockage: Can't express "depends_on AND references" relationships
  2. Graph Pollution: (if duplicates existed) Wrong edge counts
  3. IEEE 29148-2018 Non-Compliance: Traceability requires rich relationships

The Solution

Change the primary key to 5 columns, including relationship_type:

# backend/models/generic_relationships.py
class EntityRelationship(Base):
__tablename__ = "entity_relationships"

# 5-column composite primary key
source_type = Column(String(50), primary_key=True)
source_id = Column(String(255), primary_key=True)
target_type = Column(String(50), primary_key=True)
target_id = Column(String(255), primary_key=True)
relationship_type = Column(String(50), primary_key=True) # Now in PK!

Key Insight: Multiple Relationship Types Are Now Valid

With the 5-column PK:

  • REQ-001 → REQ-002 with depends_on
  • REQ-001 → REQ-002 with references ✓ (different relationship type = different row)
  • REQ-001 → REQ-002 with depends_on again ✗ (exact duplicate blocked by PK)

Migration Strategy

The migration handles the schema change safely:

def upgrade():
# Step 1: Remove any duplicates (keep newest by updated_at)
conn.execute(text("""
DELETE FROM entity_relationships
WHERE ctid NOT IN (
SELECT DISTINCT ON (source_type, source_id, target_type, target_id, relationship_type)
ctid
FROM entity_relationships
ORDER BY source_type, source_id, target_type, target_id, relationship_type,
updated_at DESC NULLS LAST
)
"""))

# Step 2: Drop 4-column primary key
op.drop_constraint("entity_relationships_pkey", "entity_relationships", type_="primary")

# Step 3: Create 5-column primary key
op.create_primary_key(
"entity_relationships_pkey",
"entity_relationships",
["source_type", "source_id", "target_type", "target_id", "relationship_type"],
)

Testing

Our TDD tests verify the constraint at the database level:

def test_duplicate_relationship_rejected_at_database_level(self, postgres_db):
"""Exact duplicate 5-tuple MUST raise IntegrityError."""
rel1 = EntityRelationship(
source_type="requirement", source_id="REQ-DUP-001",
target_type="capability", target_id="CAP-DUP-001",
relationship_type="depends_on",
)
postgres_db.add(rel1)
postgres_db.flush()

# Attempt exact duplicate
rel2_duplicate = EntityRelationship(
source_type="requirement", source_id="REQ-DUP-001",
target_type="capability", target_id="CAP-DUP-001",
relationship_type="depends_on", # Same!
)
postgres_db.add(rel2_duplicate)

with pytest.raises(IntegrityError):
postgres_db.flush() # MUST fail

Impact

BeforeAfter
Duplicates silently acceptedIntegrityError on duplicate
Graph traversal counts edges incorrectlyClean graph structure
UI shows duplicatesNo duplicates possible
ADR-026: CONDITIONALADR-026: APPROVED

Lessons Learned

  1. Composite Primary Keys Need Review: Adding a column to a table doesn't mean it's part of the uniqueness guarantee
  2. Semantic Differences Matter: "depends_on" and "references" are different relationships—the fix preserves this distinction
  3. Database Constraints > Application Validation: Race conditions can bypass application checks; database constraints are authoritative
  4. Migration Must Handle Existing Data: Don't just add constraints—clean up legacy data first

Issue #458 | ADR-026 | LLM Council Blocking Issue Resolved