Grounded Ingestion: Preventing AI Hallucination Write-Back
When AI agents have write access to your knowledge base, hallucinations become persistent. Here's how we built a provenance system that blocks ungrounded claims.
AI agents are increasingly trusted to not just read but write. They synthesize information, make inferences, and store conclusions for future reference. The problem? AI confidently states things that aren't true. And if those hallucinations get written to your knowledge base, they become persistent misinformation.
"The API uses OAuth2 for authentication." Sounds authoritative. But did the AI actually verify this, or did it hallucinate a plausible-sounding claim? Once stored, future sessions will retrieve this "fact" and build on it.
Grounded ingestion solves this by classifying every memory claim into one of three tiers based on its provenance evidence.
The 3-Tier Provenance Model
┌─────────────────────────────────────────────────────────────────┐
│ Incoming Memory Claim │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Citation │ │ Hedge │ │ Dedup │ │
│ │ Detector │ │ Detector │ │ Checker │ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Tier Determination │ │
│ └──────────────────────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ TIER 1 │ │ TIER 2 │ │ TIER 3 │ │
│ │ AUTO- │ │ FLAG │ │ BLOCK │ │
│ │ APPROVE │ │ REVIEW │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Stored Queued Rejected │
│ │
└─────────────────────────────────────────────────────────────────┘
Tier 1: Auto-Approve (High Confidence)
Claims with explicit grounding evidence are stored immediately:
- Has citation: ADR reference, commit hash, URL, issue number
- Trusted source: User-stated, documentation, manual entry
- Decision in context: Explicit decisions from conversations
Examples:
"Per ADR-003, we use Pixeltable for memory storage" → AUTO (has ADR citation)
"Fixed in commit a1b2c3d4e5f6" → AUTO (has commit hash)
"See https://docs.example.com/api" → AUTO (has URL)
"I prefer tabs over spaces" (source=user) → AUTO (trusted source)
Tier 2: Flag for Review (Medium Confidence)
Claims that aren't obviously grounded or speculative need human verification:
- AI synthesis without citation: Factual assertions from AI without explicit sources
- Dedup check failed: Provider error means we can't verify uniqueness
- Ambiguous claims: Could be true but unverified
Examples:
"The API returns JSON for REST responses" → REVIEW (AI synthesis, no citation)
"OAuth2 is the authentication mechanism" → REVIEW (factual assertion, ungrounded)
"The service uses PostgreSQL 15" → REVIEW (could be true, needs verification)
Tier 3: Block (Low Confidence)
Claims that are explicitly speculative or duplicate are rejected:
- Strong speculation: "I think", "I guess", "maybe" without any grounding
- Duplicate content: >92% similarity to existing memory
- Personal uncertainty: Language expressing the speaker doesn't know
Examples:
"I think we should use Redis" → BLOCK (personal speculation)
"I guess the API supports this" → BLOCK (admitted uncertainty)
"Maybe we could try GraphQL" → BLOCK (ungrounded suggestion)
"Per ADR-003, we use PostgreSQL" (already stored) → BLOCK (duplicate)
Note: Technical hedges like "may", "typically", or "often" are treated differently—see Hedge Detection below.
Detection Mechanisms
Citation Detection & Verification
Grounding requires two steps: detection (finding citation patterns) and verification (checking they're real).
Step 1: Detection
The CitationDetector uses regex patterns to identify potential grounding evidence:
PATTERNS = {
"adr": r"\[?ADR[-\s]?(\d+)\]?", # [ADR-003], ADR-003, ADR 003
"commit": r"\b[a-f0-9]{7,40}\b", # 7-40 hex chars (not colors)
"url": r"https?://[^\s<>\"]+", # http:// or https://
"issue": r"(?:#(\d+)|GH-(\d+))", # #123 or GH-456
}
Smart filtering:
- Excludes 6-character hex codes (colors like
#abc123) - Strips URLs before commit detection (no false positives in URL paths)
- Returns citation type, position, and extracted ID
Step 2: Verification (Critical)
Detection alone is insufficient—AI can hallucinate plausible-looking citations. Verification checks that cited artifacts actually exist:
async def verify_citation(citation: Citation) -> VerificationResult:
"""Verify that a detected citation actually exists."""
match citation.type:
case "url":
# Check URL resolves (HTTP HEAD request)
response = await http_client.head(citation.value, timeout=5)
return VerificationResult(
valid=response.status_code == 200,
reason=f"HTTP {response.status_code}"
)
case "commit":
# Check commit exists in repo
result = subprocess.run(
["git", "cat-file", "-t", citation.value],
capture_output=True
)
return VerificationResult(
valid=result.returncode == 0,
reason="Commit exists" if result.returncode == 0 else "Unknown commit"
)
case "adr":
# Check ADR file exists
adr_path = f"docs/adrs/ADR-{citation.value}-*.md"
exists = len(glob.glob(adr_path)) > 0
return VerificationResult(valid=exists, reason="ADR file exists" if exists else "ADR not found")
case "issue":
# Check issue exists (requires API call to GitHub/GitLab)
# Implementation depends on your issue tracker
...
Why verification matters: LLMs confidently fabricate citations. We've seen:
- URLs that return 404
- Commit hashes that don't exist
- ADR numbers that were never written
Detection + verification together provide actual grounding.
Hedge Detection
The HedgeDetector identifies uncertainty language, but not all hedges are equal. We distinguish between personal speculation (block) and technical hedges (review).
Hedge categories and actions:
| Category | Examples | Action |
|---|---|---|
| Personal speculation | I think, I guess, I believe, I assume | Block |
| Admitted uncertainty | I don't know, not sure, I could be wrong | Block |
| Suggestions | maybe we should, perhaps we could | Block |
| Technical hedges | may, might, typically, often, usually | Review |
| Approximations | approximately, around, roughly | Review |
Why the distinction? Legitimate technical documentation uses hedges appropriately:
- "The server may timeout under load" — valid engineering guidance
- "Connections typically complete in <100ms" — accurate qualification
- "I think we should use Redis" — ungrounded personal opinion
detector = HedgeDetector()
# Personal speculation → Block
result = detector.analyze("I think we should use Redis")
# HedgeResult(is_speculative=True, action=BLOCK, hedge_words=["I think"])
# Technical hedge → Review
result = detector.analyze("The API may return errors under load")
# HedgeResult(is_speculative=True, action=REVIEW, hedge_words=["may"])
# No hedges → Continue to other checks
result = detector.analyze("Per ADR-003, we use PostgreSQL")
# HedgeResult(is_speculative=False, action=CONTINUE)
Bypass prevention: The trivial bypass "I think X, definitely" doesn't work—personal speculation markers always trigger regardless of assertion markers.
False positive filtering:
- "May 2024" → Month name, not modal verb
- "couldn't" → Negation often indicates certainty
- "should be 5" → Factual numeric description
Deduplication
The DedupChecker prevents duplicate memories using word-level Jaccard similarity:
Jaccard(A, B) = |A ∩ B| / |A ∪ B|
Where A and B are sets of lowercase tokens from each text.
Threshold: 92% similarity = duplicate (per ADR-003)
from collections import Counter
def jaccard_similarity(text1: str, text2: str) -> float:
"""Calculate word-level Jaccard similarity."""
words1 = set(text1.lower().split())
words2 = set(text2.lower().split())
intersection = len(words1 & words2)
union = len(words1 | words2)
return intersection / union if union > 0 else 0.0
# Usage
checker = DedupChecker(provider=memory_provider, threshold=0.92)
result = await checker.check_duplicate(
content="Per ADR-003, we use Pixeltable for storage",
user_id="user-123",
memory_type="decision"
)
# DuplicateResult(is_duplicate=True, similarity=0.95, existing_memory_id="mem-456")
Why Jaccard over semantic similarity?
| Approach | Speed | Catches | Misses |
|---|---|---|---|
| Jaccard (word overlap) | <1ms | Copy-paste, minor edits | Paraphrasing, synonyms |
| Vector similarity | 50-200ms | Semantic duplicates | Requires embedding model |
| MinHash/SimHash | <5ms | Near-duplicates at scale | Needs preprocessing |
We chose Jaccard for speed—it runs on every ingestion without latency impact. The 92% threshold catches most copy-paste duplicates while allowing legitimate variations.
Known limitation: Jaccard misses semantic duplicates where AI rephrases the same claim. For higher-value knowledge bases, consider upgrading to vector-based deduplication or MinHash for scale:
# Future: Semantic deduplication (slower but catches paraphrasing)
async def semantic_dedup(content: str, user_id: str) -> DuplicateResult:
embedding = await embed(content)
similar = await vector_index.search(embedding, threshold=0.95)
return DuplicateResult(is_duplicate=len(similar) > 0, ...)
The Decision Tree
from enum import Enum
from dataclasses import dataclass
class Tier(Enum):
AUTO_APPROVE = "tier_1"
FLAG_REVIEW = "tier_2"
BLOCK = "tier_3"
class HedgeAction(Enum):
BLOCK = "block" # Personal speculation
REVIEW = "review" # Technical hedges
CONTINUE = "none" # No hedges
TRUSTED_SOURCES = {"user", "documentation", "adr", "commit", "manual"}
def determine_tier(
hedge_result: HedgeAction,
is_duplicate: bool,
dedup_failed: bool,
has_verified_citation: bool,
source: str,
memory_type: str
) -> tuple[Tier, str]:
"""
Determine ingestion tier based on grounding evidence.
Returns (tier, reason) tuple.
"""
# Tier 3: Block personal speculation
if hedge_result == HedgeAction.BLOCK:
return Tier.BLOCK, "Contains personal speculation"
# Tier 3: Block duplicates
if is_duplicate:
return Tier.BLOCK, "Duplicate of existing memory"
# Tier 2: Technical hedges need review
if hedge_result == HedgeAction.REVIEW:
return Tier.FLAG_REVIEW, "Contains technical hedges - needs verification"
# Tier 2: Dedup check failed (fail-closed)
if dedup_failed:
return Tier.FLAG_REVIEW, "Dedup check failed - cannot verify uniqueness"
# Tier 1: Verified citation
if has_verified_citation:
return Tier.AUTO_APPROVE, "Has verified citation"
# Tier 1: Trusted source
if source in TRUSTED_SOURCES:
return Tier.AUTO_APPROVE, f"From trusted source: {source}"
# Tier 1: User-stated decisions/preferences
if memory_type == "decision" and source == "conversation":
return Tier.AUTO_APPROVE, "Decision stated in conversation"
if memory_type == "preference" and source in ("conversation", "chat"):
return Tier.AUTO_APPROVE, "Preference stated by user"
# Tier 2: Everything else needs review
return Tier.FLAG_REVIEW, "Ungrounded assertion needs verification"
Key design choices:
- Fail-closed: Provider errors (like dedup failures) route to review, never auto-approve
- Hedge nuance: Personal speculation blocks; technical hedges go to review
- Verification required: Citations must be verified, not just detected
The Review Queue
Tier 2 memories enter a review queue for human verification:
@dataclass
class PendingMemory:
queue_id: str # Unique identifier
user_id: str # Owner
content: str # The claim
memory_type: str # preference/fact/decision
source: str # Origin
evidence: EvidenceObject # Provenance metadata
submitted_at: datetime # UTC timestamp
Queue operations:
queue = ReviewQueue(store_callback=store_memory)
# Enqueue a flagged memory
queue_id = await queue.enqueue(
user_id="user-123",
content="The API uses OAuth2",
memory_type="fact",
source="ai_synthesis",
evidence=evidence,
validation_result=result,
)
# User reviews their pending memories
pending = await queue.get_pending("user-123", limit=10)
# User approves (stores the memory)
memory_id = await queue.approve(queue_id, reviewer="user-123")
# Or rejects (discards with reason)
await queue.reject(queue_id, reviewer="user-123", reason="Incorrect, we use JWT")
Security properties:
| Property | Implementation |
|---|---|
| Authorization | Only owner can approve/reject (reviewer == user_id) |
| Isolation | get_by_id requires user_id match (prevents IDOR) |
| DoS prevention | Per-user limit (100), total limit (10,000) |
| Race-free | Atomic removal before store callback |
| Audit trail | All actions recorded with timestamp |
Evidence Objects
Every memory carries provenance metadata:
@dataclass
class EvidenceObject:
claim: str # The content
capture_time: datetime # When captured
confidence: str # high/medium/low
source_id: Optional[str] = None # ADR-003, commit:a1b2c3d, URL
validity_horizon: Optional[datetime] = None # Expiration
metadata: dict = {} # Extensible
This enables downstream systems to:
- Filter by confidence level
- Trace claims back to sources
- Expire time-sensitive information
- Audit memory provenance
Validation Results
The complete output of ingestion validation:
@dataclass
class ValidationResult:
tier: IngestionTier # AUTO_APPROVE, FLAG_REVIEW, or BLOCK
approved: bool # True only for Tier 1
reason: str # Human-readable explanation
evidence: EvidenceObject # Provenance metadata
checks_passed: list[str] # ["citation_present", "no_speculation"]
checks_failed: list[str] # ["hedge_words_detected: maybe"]
similarity_score: Optional[float] # If duplicate found
conflicting_memory_id: Optional[str] # ID of duplicate
Usage Example
from src.memory.ingestion import IngestionValidator, ReviewQueue
# Create validator with deduplication
validator = IngestionValidator(
provider=memory_provider,
enable_dedup=True
)
# Validate an incoming claim
result = await validator.validate(
content="Per ADR-003, we use Pixeltable for memory storage",
memory_type="decision",
source="conversation",
user_id="user-123"
)
# Route based on tier
if result.tier == IngestionTier.AUTO_APPROVE:
# Store immediately
memory_id = await store_memory(result.evidence)
print(f"Stored: {memory_id}")
elif result.tier == IngestionTier.FLAG_REVIEW:
# Queue for review
queue = ReviewQueue(store_callback=store_memory)
queue_id = await queue.enqueue(
user_id="user-123",
content=result.evidence.claim,
memory_type="decision",
source="conversation",
evidence=result.evidence,
validation_result=result,
)
print(f"Queued for review: {queue_id}")
else: # BLOCK
print(f"Rejected: {result.reason}")
print(f"Failed checks: {result.checks_failed}")
What Gets Blocked vs Accepted
Always Blocked (Tier 3)
# Personal speculation
"I think we should use Redis" # personal opinion
"I guess the API supports this" # admitted uncertainty
"Maybe we could try GraphQL" # ungrounded suggestion
# Duplicates
"Per ADR-003, we use PostgreSQL" # (if already stored)
# Bypass attempts
"I think we should use Redis, definitely" # personal speculation still blocks
Always Accepted (Tier 1)
# Has VERIFIED citation (not just detected)
"Per ADR-003, we use PostgreSQL" # ADR file exists
"Fixed in commit a1b2c3d4e5" # commit exists in repo
"See https://docs.example.com/api" # URL returns 200
# Trusted source
"I prefer tabs over spaces" (source=user) # user-stated
"OAuth2 is required" (source=documentation) # documentation
# Decisions in context
"We decided to use PostgreSQL" (type=decision, source=conversation)
Flagged for Review (Tier 2)
# Technical hedges (legitimate uncertainty)
"The server may timeout under load" # technical "may"
"Connections typically complete in <100ms" # "typically" is qualified
# AI synthesis without citation
"The API returns JSON for REST responses"
"OAuth2 is the authentication mechanism"
# Citation detected but NOT verified
"Per ADR-999, we use magic" # ADR-999 doesn't exist
# Dedup check failed
(any content when provider throws an error)
The Security Model
Grounded ingestion implements defense-in-depth:
- Hedge detection: First line. Blocks personal speculation, reviews technical hedges.
- Deduplication: Second line. Prevents duplicate pollution (lexical).
- Citation verification: Third line. Confirms cited artifacts exist.
- Source trust: Fourth line. Trusts verified sources.
- Review queue: Fifth line. Human verification for uncertain claims.
- Fail-closed: Sixth line. Errors flag for review, never auto-approve.
The goal: Minimize hallucination write-back while keeping friction low.
Limitations
This system isn't perfect. Know what it can't catch:
Evasion Risks
| Attack | Can We Catch It? | Mitigation |
|---|---|---|
| Confident hallucination ("The API uses OAuth2.") | No | Goes to Tier 2 review |
| Fabricated but valid-looking URL | Partial | Verification catches 404s, not wrong content |
| Paraphrased duplicate | No | Jaccard misses semantic duplicates |
| Prompt injection teaching AI to avoid hedges | No | Out of scope (input validation problem) |
| Correct citation, wrong claim | No | Citation verification doesn't check relevance |
What This System Can't Do
- Verify claim-citation relevance: We check that ADR-003 exists, not that it actually supports the claim
- Catch confident hallucinations: "X is true" without hedges goes to review, not block
- Scale to semantic deduplication: Jaccard is fast but shallow
- Replace human judgment: Tier 2 still requires human review
Recommendations for High-Stakes Use
For production knowledge bases with compliance requirements:
# Upgrade path for stricter grounding
config = GroundedIngestionConfig(
# Semantic deduplication (slower but thorough)
dedup_method="vector",
dedup_threshold=0.95,
# LLM-based claim verification (expensive but accurate)
verify_claim_relevance=True,
# All AI synthesis goes to review (safest)
auto_approve_ai_synthesis=False,
)
The current implementation optimizes for speed and low friction over maximum accuracy. Adjust based on your risk tolerance.
Performance Characteristics
| Check | Latency | Notes |
|---|---|---|
| Citation detection | <1ms | Regex patterns |
| Hedge detection | <1ms | Word matching |
| Deduplication | 10-50ms | Provider query + Jaccard |
| Queue operations | <1ms | In-memory dict |
| Total validation | ~50ms | Dominated by dedup |
For high-throughput scenarios, you can disable dedup:
validator = IngestionValidator(enable_dedup=False)
# Faster, but won't catch duplicates
Why This Matters
AI systems are moving from read-only to read-write. They don't just retrieve information—they synthesize, infer, and persist conclusions. Without provenance tracking, hallucinations compound:
- AI hallucinates "The API uses OAuth2"
- Gets stored as a "fact"
- Future queries retrieve it
- AI builds on it: "Since we use OAuth2, we need refresh tokens"
- More hallucinations stored
- Knowledge base drifts from reality
Grounded ingestion breaks this cycle. Every claim goes through provenance checking:
- Grounded + Verified: Auto-approved with citation trail
- Uncertain: Flagged for human review
- Speculative: Blocked before it enters the knowledge base
This isn't a perfect solution—confident hallucinations still need human review, and sophisticated evasion is possible. But it dramatically reduces the rate at which ungrounded claims pollute your knowledge base, and it creates an audit trail for everything that does get stored.
The goal isn't zero hallucinations (that's impossible with current AI). The goal is traceable provenance: knowing where every stored claim came from and why it was trusted.
Grounded ingestion is part of ADR-003 Phase 2. See the full ADR for implementation details and security considerations.