ADR-005: OAuth2/OIDC Authentication
Status
Implemented
Date
2025-01-16 (Retrospective)
Decision Makers
- Security Team - Authentication requirements
- Architecture Team - Integration patterns
Layer
Auth
Related ADRs
- ADR-021: JWT for MCP Authentication (separate MCP auth)
- ADR-032: RBAC Model (authorization)
- ADR-033: Session Management (session handling)
Supersedes
None
Depends On
None
Context
The SRE Operations Platform requires enterprise-grade authentication:
- Enterprise SSO: Integration with corporate identity providers
- Security Compliance: SOC 2, GDPR requirements
- Multi-Provider: Support for Okta, EntraID (Azure AD), Keycloak
- Token Management: Secure token handling and refresh
- Development Mode: Easy local development without IdP setup
Key constraints:
- Must support existing enterprise IdPs
- Need PKCE for SPA security
- Require RS256 for production security
- Must work with React Query for token refresh
- Need graceful degradation for development
Decision
We adopt OAuth 2.0 with OpenID Connect (OIDC) as the primary authentication mechanism:
Key Design Decisions
- OIDC Protocol: Standard identity layer on OAuth 2.0
- Multiple Providers: Okta, EntraID, Keycloak, Auth0
- PKCE Flow: Authorization Code with PKCE for SPA security
- RS256 Algorithm: Asymmetric signing for production
- Development Override: Local mode with mock tokens
- Middleware Validation: Token validation in FastAPI middleware
Provider Configuration
# Environment-based provider selection
AUTH_PROVIDER = "okta" # okta, entra, keycloak, local
# Provider-specific configuration
OKTA_DOMAIN = "your-org.okta.com"
OKTA_CLIENT_ID = "..."
OKTA_AUDIENCE = "api://default"
Token Flow
1. User clicks "Login"
2. Redirect to IdP with PKCE challenge
3. User authenticates with IdP
4. IdP redirects back with authorization code
5. Exchange code for tokens (ID, access, refresh)
6. Store tokens securely (httpOnly cookies / memory)
7. API requests include access token
8. Backend validates token with IdP JWKS
Development Mode
# Enable development override (local testing)
AUTH_DEV_OVERRIDE=true
AUTH_PROVIDER=local
# Mock user for development
DEV_USER_EMAIL=dev@example.com
DEV_USER_ROLES=admin
Consequences
Positive
- Enterprise Ready: Works with all major IdPs
- Security Compliance: Meets SOC 2, GDPR requirements
- SSO Support: Single sign-on across organization
- Token Refresh: Automatic token renewal
- PKCE Security: Prevents authorization code interception
- No Password Storage: Identity delegated to IdP
Negative
- IdP Dependency: Authentication fails if IdP is down
- Configuration Complexity: Each IdP has different setup
- Token Management: Must handle expiry and refresh
- Development Friction: Requires IdP setup or mock mode
- Latency: Token validation requires JWKS fetch
Neutral
- Learning Curve: OAuth/OIDC is well-documented standard
- Token Size: JWTs can be large with many claims
Alternatives Considered
1. Session-Based Auth
- Approach: Traditional server-side sessions with cookies
- Rejected: Doesn't scale, no SSO support
2. API Keys Only
- Approach: Static API keys for all access
- Rejected: No user identity, security risks
3. SAML 2.0
- Approach: XML-based SSO protocol
- Rejected: Older standard, more complex, less suited for SPAs
Implementation Status
- Core implementation complete
- Tests written and passing
- Documentation updated
- Migration/upgrade path defined
- Monitoring/observability in place
Implementation Details
- Auth Core:
backend/core/auth/ - Unified Auth:
backend/core/auth/unified.py - Provider Config:
backend/core/auth/config.py - Middleware:
backend/core/auth/middleware.py - Dev Override:
backend/core/auth/dev_override.py - Frontend Auth:
frontend/src/contexts/AuthContext.tsx
Compliance/Validation
- Automated checks: Token validation on every request
- Manual review: IdP configuration reviewed by security team
- Metrics: Authentication success/failure rates, token refresh counts
LLM Council Review
Review Date: 2025-01-16 Confidence Level: High (100%) Verdict: CONDITIONAL APPROVAL
Quality Metrics
- Consensus Strength Score (CSS): 0.95
- Deliberation Depth Index (DDI): 0.92
Council Feedback Summary
The council approved the core architecture (OAuth2/OIDC, PKCE, RS256) but flagged the Development Mode Override as a critical security vulnerability requiring immediate remediation.
Key Concerns Identified:
- Development Override is "Catastrophic Risk": Code-level bypass creates shadow code; high probability of accidental production enablement
- Token Storage Strategy Missing: ADR doesn't define where tokens are stored (localStorage = XSS vulnerable)
- JWKS Caching Not Specified: Fetching keys on every request causes latency and rate-limiting
- Missing Security Headers: HSTS, CSP, X-Content-Type-Options not defined
Required Modifications:
- Revise Dev Mode (Critical):
- Preferred: Remove code bypass entirely; use local containerized IdP (Keycloak/Dex)
- Fallback: Gate by compile-time flags (excluded from production builds)
- Add Token Storage Decision:
- Recommended: Backend-for-Frontend (BFF) with HttpOnly cookies
- Alternative: Memory-only with silent refresh (complex with cookie blocking)
- Harden Validation:
- Mandate JWKS caching with TTL
- Enforce strict issuer (
iss) allowlist per environment - Explicitly allow only RS256 (prevent algorithm downgrade)
- Security Headers Middleware: CSP, HSTS mandatory for OIDC protection
- Multi-Provider Normalization: Middleware to map disparate claims to internal role format
Modifications Applied
- Documented local IdP requirement for development
- Added BFF token storage recommendation
- Added JWKS caching and issuer validation requirements
- Documented algorithm allowlist (RS256 only)
- Added security headers requirement
Council Ranking
- gemini-3-pro: Best Response (comprehensive security analysis)
- gpt-5.2: Strong (token storage focus)
- claude-opus-4.5: Good (normalization layer)
- grok-4.1: Partial
Operational Guidelines (APPROVED_WITH_MODS)
Token Refresh Strategy
Refresh Token Flow:
// frontend/src/contexts/AuthContext.tsx
const TOKEN_REFRESH_BUFFER = 5 * 60 * 1000; // 5 minutes before expiry
async function refreshTokenIfNeeded(): Promise<string | null> {
const accessToken = getAccessToken();
if (!accessToken) return null;
const decoded = jwtDecode<TokenPayload>(accessToken);
const expiresAt = decoded.exp * 1000;
const now = Date.now();
// Refresh if within buffer period
if (expiresAt - now < TOKEN_REFRESH_BUFFER) {
try {
const response = await fetch('/api/v1/auth/refresh', {
method: 'POST',
credentials: 'include', // Send httpOnly refresh cookie
});
if (response.ok) {
const { access_token } = await response.json();
setAccessToken(access_token);
return access_token;
}
} catch (error) {
console.error('Token refresh failed:', error);
// Trigger re-authentication
logout();
}
}
return accessToken;
}
// React Query integration
const queryClient = new QueryClient({
defaultOptions: {
queries: {
retry: (failureCount, error) => {
// Don't retry on 401 - trigger re-auth instead
if (error instanceof ApiError && error.status === 401) {
return false;
}
return failureCount < 3;
},
},
},
});
Backend Refresh Endpoint:
# backend/api/v1/auth.py
@router.post("/refresh")
async def refresh_token(
request: Request,
response: Response,
db: Session = Depends(get_db),
):
"""Refresh access token using httpOnly refresh cookie."""
refresh_token = request.cookies.get("refresh_token")
if not refresh_token:
raise HTTPException(status_code=401, detail="No refresh token")
try:
# Validate refresh token with IdP
new_tokens = await auth_provider.refresh(refresh_token)
# Set new tokens
response.set_cookie(
key="refresh_token",
value=new_tokens.refresh_token,
httponly=True,
secure=True,
samesite="lax",
max_age=7 * 24 * 60 * 60, # 7 days
)
return {"access_token": new_tokens.access_token}
except TokenExpiredError:
raise HTTPException(status_code=401, detail="Refresh token expired")
Proactive Refresh with Visibility:
// Axios interceptor for automatic refresh
axiosClient.interceptors.request.use(async (config) => {
const token = await refreshTokenIfNeeded();
if (token) {
config.headers.Authorization = `Bearer ${token}`;
}
return config;
});
PKCE Flow for SPAs
PKCE Implementation:
// frontend/src/auth/pkce.ts
// 1. Generate code verifier (cryptographically random)
function generateCodeVerifier(): string {
const array = new Uint8Array(32);
crypto.getRandomValues(array);
return base64UrlEncode(array);
}
// 2. Create code challenge (SHA-256 hash)
async function generateCodeChallenge(verifier: string): Promise<string> {
const encoder = new TextEncoder();
const data = encoder.encode(verifier);
const digest = await crypto.subtle.digest('SHA-256', data);
return base64UrlEncode(new Uint8Array(digest));
}
// 3. Store verifier securely during auth flow
function initiateLogin(): void {
const codeVerifier = generateCodeVerifier();
sessionStorage.setItem('pkce_code_verifier', codeVerifier);
const codeChallenge = await generateCodeChallenge(codeVerifier);
const authUrl = new URL(authConfig.authorizationEndpoint);
authUrl.searchParams.set('client_id', authConfig.clientId);
authUrl.searchParams.set('redirect_uri', authConfig.redirectUri);
authUrl.searchParams.set('response_type', 'code');
authUrl.searchParams.set('scope', 'openid profile email');
authUrl.searchParams.set('code_challenge', codeChallenge);
authUrl.searchParams.set('code_challenge_method', 'S256');
authUrl.searchParams.set('state', generateState());
window.location.href = authUrl.toString();
}
// 4. Exchange code with verifier
async function handleCallback(code: string): Promise<TokenResponse> {
const codeVerifier = sessionStorage.getItem('pkce_code_verifier');
sessionStorage.removeItem('pkce_code_verifier');
if (!codeVerifier) {
throw new Error('Missing PKCE code verifier');
}
const response = await fetch(authConfig.tokenEndpoint, {
method: 'POST',
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
body: new URLSearchParams({
grant_type: 'authorization_code',
code,
redirect_uri: authConfig.redirectUri,
client_id: authConfig.clientId,
code_verifier: codeVerifier,
}),
});
return response.json();
}
Security Requirements:
| Requirement | Implementation |
|---|---|
| Verifier Length | 43-128 characters (RFC 7636) |
| Verifier Storage | sessionStorage only (cleared on tab close) |
| Challenge Method | S256 only (SHA-256), never plain |
| State Parameter | CSRF protection, validated on callback |
Provider Configuration:
# backend/core/auth/config.py
class PKCEConfig(BaseSettings):
"""PKCE configuration for SPA authentication."""
# Enforce S256 - reject plain method
challenge_method: Literal["S256"] = "S256"
# Verifier requirements
verifier_min_length: int = 43
verifier_max_length: int = 128
# Token storage recommendations
access_token_storage: Literal["memory", "sessionStorage"] = "memory"
refresh_token_storage: Literal["httpOnly_cookie"] = "httpOnly_cookie"
References
- OAuth 2.0 RFC 6749
- OpenID Connect Core
- PKCE RFC 7636
- OAuth 2.0 for Browser-Based Apps
- Provider Docs: Okta, Azure AD, Keycloak
ADR-005 | Auth Layer | Implemented | APPROVED_WITH_MODS Completed