ADR-005: PostgreSQL with Future Vector Database Migration
Status: APPROVED with Timeline Acceleration
Date: 2025-08-25
Author: Architecture Review Team
Context
Current plan: Start with PostgreSQL + pgvector extension for semantic similarity, migrate to dedicated vector database (Pinecone/Weaviate) later.
The semantic similarity search is core to the value proposition - users finding "truly similar projects, not just same-language projects."
Decision
ACCELERATE vector database adoption - implement in Month 1, not Month 3
Reasoning for acceleration:
- Semantic similarity is core differentiator, not nice-to-have feature
- PostgreSQL + pgvector limitations will surface quickly at scale
- Migration complexity increases with data volume
- Vector databases provide better semantic search performance
Consequences
Benefits of early adoption:
- Purpose-built vector search performance from day one
- Avoid painful migration during growth period
- Better similarity search quality (crucial for user retention)
- Cleaner architecture without hybrid approaches
Costs of early adoption:
- Additional infrastructure complexity in MVP
- Higher initial costs (~$200/month vs included with PostgreSQL)
- Learning curve for vector database operations
- Potential over-engineering for initial user base
Alternatives Considered
-
PostgreSQL + pgvector (Original Plan)
- Pros: Simple, included with existing database
- Cons: Performance limitations, migration complexity later
-
Pinecone from Day One (RECOMMENDED)
- Pros: Purpose-built, excellent performance, managed service
- Cons: Additional cost, vendor dependency
-
Weaviate Self-Hosted
- Pros: Open source, full control, cost optimization
- Cons: Infrastructure overhead, maintenance burden
-
Hybrid Approach
- Pros: Gradual migration, cost optimization
- Cons: Complexity of maintaining two systems
Risk Assessment
Risks with PostgreSQL approach:
- Performance Degradation: Similarity search slows as data grows
- Migration Complexity: Moving 100K+ embeddings is non-trivial
- Feature Limitations: Missing vector database features (HNSW indexing, etc.)
Risks with Vector Database:
- Cost Scaling: Could reach $1K+/month at scale
- Vendor Lock-in: Pinecone proprietary format
- Learning Curve: Team needs to learn vector database concepts
Mitigation Strategy:
// Abstraction layer for vector operations
interface VectorStore {
upsert(vectors: Vector[]): Promise<void>
query(vector: Vector, filters?: any): Promise<SearchResult[]>
delete(ids: string[]): Promise<void>
}
class PineconeStore implements VectorStore {
// Pinecone implementation
}
class PostgresVectorStore implements VectorStore {
// PostgreSQL fallback implementation
}
// Easy switching between implementations
const vectorStore: VectorStore = process.env.NODE_ENV === 'production'
? new PineconeStore()
: new PostgresVectorStore()
Migration Strategy
Month 1 Implementation:
- Week 1: Set up Pinecone index, implement abstraction layer
- Week 2: Migrate embedding generation to use Pinecone
- Week 3: Implement semantic similarity search API
- Week 4: Performance testing and optimization
Cost Optimization:
class VectorCostOptimizer {
async optimizeStorage(repo: Repository): Promise<void> {
// Only store vectors for repositories with sufficient engagement
if (repo.stars < 10 && repo.weeklyViews < 100) {
// Use rule-based similarity for low-engagement repos
await this.removeFromVectorDB(repo.id)
return
}
// Compress vectors for less critical repositories
if (repo.stars < 100) {
const compressedVector = this.compressVector(repo.embedding)
await this.updateVector(repo.id, compressedVector)
}
}
}
Conclusion
While adding complexity to the MVP, early adoption of a purpose-built vector database is justified by the central importance of semantic similarity to the value proposition. The abstraction layer approach provides flexibility while the cost optimization strategy keeps expenses manageable during growth.