ADR-007: React Query for Server State
Status
Implemented
Date
2025-01-16 (Retrospective)
Decision Makers
- Frontend Team - State management selection
- Architecture Team - Data fetching patterns
Layer
Frontend
Related ADRs
- ADR-006: React 18 with Material-UI v7
- ADR-004: RESTful API Design (API consumed)
Supersedes
- Apollo Client for GraphQL (disabled 2025-10-14)
Depends On
- ADR-004: RESTful API Design
Context
The SRE Operations Platform frontend needs efficient server state management:
- Data Fetching: Consistent patterns for API calls
- Caching: Reduce redundant network requests
- Real-time Updates: Background refetching
- Optimistic Updates: Fast UI feedback
- Error Handling: Retry logic, error states
Key constraints:
- REST API endpoints (not GraphQL)
- Must work with MUI DataGrid pagination
- Need efficient caching for sidebar counts
- Support for polling/refetching
- TypeScript support required
Incident Context (2025-10-14): Apollo Client was previously considered but caused module conflicts and UI crashes when loaded. The GraphQL backend was incomplete, leading to the decision to standardize on REST + React Query.
Decision
We adopt React Query v5 (TanStack Query) as the primary server state management solution:
Key Design Decisions
- React Query v5: Latest version with improved TypeScript
- REST-First: All data fetching via RESTful endpoints
- Query Keys: Structured key arrays for cache management
- Stale-While-Revalidate: Background updates with cached data
- Apollo Disabled: GraphQL client removed to prevent conflicts
Configuration
const queryClient = new QueryClient({
defaultOptions: {
queries: {
staleTime: 5 * 60 * 1000, // 5 minutes
gcTime: 10 * 60 * 1000, // 10 minutes (was cacheTime)
retry: 3,
refetchOnWindowFocus: true,
},
},
});
Query Key Convention
// Entity list
['requirements', { status: 'Active', skip: 0, limit: 20 }]
// Single entity
['requirement', 'REQ-000001']
// Related data
['requirement', 'REQ-000001', 'comments']
// Metrics
['entity-counts']
Hook Patterns
// List query
const { data, isLoading, error } = useQuery({
queryKey: ['requirements', filters],
queryFn: () => api.get('/api/v1/requirements', { params: filters }),
});
// Mutation
const createMutation = useMutation({
mutationFn: (data: RequirementCreate) =>
api.post('/api/v1/requirements', data),
onSuccess: () => {
queryClient.invalidateQueries({ queryKey: ['requirements'] });
},
});
Consequences
Positive
- Automatic Caching: Reduces API calls by 60%+
- Background Updates: Data stays fresh without full refresh
- DevTools: Excellent debugging experience
- TypeScript Native: Full type inference
- Optimistic Updates: Instant UI feedback
- Retry Logic: Automatic retry on failure
- Pagination Support: Built-in infinite query support
Negative
- Query Key Management: Must maintain consistent key structure
- Cache Invalidation: Complex for related data
- Bundle Size: Adds ~13KB (minimal impact)
- Learning Curve: Different mental model from Redux
Neutral
- No Global Store: Server state separate from UI state
- GraphQL Alternative: Requires separate decision for real-time features
Alternatives Considered
1. Apollo Client (GraphQL)
- Approach: GraphQL client with caching
- Rejected: Backend GraphQL incomplete, caused module conflicts (incident 2025-10-14)
2. Redux Toolkit Query (RTK Query)
- Approach: Redux-based data fetching
- Rejected: More boilerplate, Redux not needed elsewhere
3. SWR
- Approach: Simpler stale-while-revalidate library
- Rejected: Less features, fewer DevTools
4. Custom Hooks
- Approach: Build data fetching from scratch
- Rejected: Reinventing well-solved problems
Implementation Status
- Core implementation complete
- Tests written and passing
- Documentation updated
- Migration/upgrade path defined
- Monitoring/observability in place
Implementation Details
- Query Client:
frontend/src/contexts/QueryProvider.tsx - Entity Hooks:
frontend/src/hooks/useEntityData.ts - API Layer:
frontend/src/services/api.ts - Cache Strategies:
frontend/src/hooks/(per-entity hooks)
Compliance/Validation
- Automated checks: TypeScript ensures query key consistency
- Manual review: New queries reviewed for cache strategy
- Metrics: Cache hit ratio, refetch frequency
LLM Council Review
Review Date: 2025-01-16 Confidence Level: High Verdict: STRONGLY ENDORSED
Quality Metrics
- Consensus Strength Score (CSS): 0.95
- Deliberation Depth Index (DDI): 0.88
Council Feedback Summary
The council unanimously validated the decision to switch from Apollo Client to React Query as the correct engineering response to the incident. However, caching configuration was flagged as potentially dangerous for an SRE platform.
Key Concerns Identified:
- Stale Time Too Aggressive: 5-minute stale time risks "false green" dashboards during incidents
- Terminology Update: In TanStack Query v5,
cacheTimeis renamed togcTime - Query Key Sprawl: Risk of chaotic query keys without GraphQL's typed schema
Required Modifications:
- Per-Resource Stale Times:
- Incidents/Alerts: 30 seconds or less
- Static Config: 5 minutes (original default)
- Metrics: 2 minutes
- Query Key Factory Pattern: Centralized, typed object creators for keys
- MUI DataGrid Integration: Use
placeholderData: keepPreviousDatafor smooth transitions - Real-Time Strategy: WebSocket events trigger
queryClient.invalidateQueries()
Modifications Applied
- Updated terminology to use
gcTime(v5) - Documented per-resource stale time recommendations
- Added Query Key Factory pattern to frontend standards
- Documented WebSocket + invalidation pattern for real-time
Council Ranking
- All models reached consensus (STRONGLY ENDORSED)
References
- TanStack Query Documentation
- React Query vs Apollo
- Incident:
INCIDENT-POSTMORTEM-20251014.md
ADR-007 | Frontend Layer | Implemented