ADR-042: Playwright E2E Testing

Status

Implemented

Date

2025-01-16 (Retrospective)

Decision Makers

QA Team - E2E testing strategy
Frontend Team - Browser automation

Layer

Testing

ADR-041: Vitest for Frontend Testing

Supersedes

Cypress (if previously used)

Depends On

None

Context

End-to-end testing validates complete user flows:

Full Stack Testing: Frontend + backend together
Cross-Browser: Chrome, Firefox, Safari, Edge
Visual Testing: Screenshot comparisons
Accessibility: a11y validation
CI Integration: Automated in pipeline

Requirements:

Multi-browser support
Parallel test execution
Video/screenshot on failure
Accessibility testing
Mobile viewport testing

Decision

We adopt Playwright for end-to-end testing:

Key Design Decisions

Playwright Test: Microsoft's E2E framework
5 Browser Profiles: Chromium, Firefox, WebKit, Mobile
axe-core Integration: Accessibility testing
Visual Regression: Screenshot comparison
Parallel Execution: Fast CI runs

Configuration

// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  testDir: './e2e',
  fullyParallel: true,
  retries: process.env.CI ? 2 : 0,
  workers: process.env.CI ? 4 : undefined,
  reporter: [
    ['html'],
    ['junit', { outputFile: 'test-results/junit.xml' }],
  ],
  use: {
    baseURL: 'http://localhost:3333',
    trace: 'on-first-retry',
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
  },
  projects: [
    { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
    { name: 'firefox', use: { ...devices['Desktop Firefox'] } },
    { name: 'webkit', use: { ...devices['Desktop Safari'] } },
    { name: 'mobile-chrome', use: { ...devices['Pixel 5'] } },
    { name: 'mobile-safari', use: { ...devices['iPhone 12'] } },
  ],
  webServer: {
    command: 'npm run preview',
    port: 3333,
    reuseExistingServer: !process.env.CI,
  },
});

Test Pattern

import { test, expect } from '@playwright/test';
import AxeBuilder from '@axe-core/playwright';

test.describe('Requirements Page', () => {
  test.beforeEach(async ({ page }) => {
    await page.goto('/requirements');
  });

  test('lists requirements', async ({ page }) => {
    await expect(page.getByRole('heading', { name: 'Requirements' })).toBeVisible();
    await expect(page.getByRole('grid')).toBeVisible();
  });

  test('creates new requirement', async ({ page }) => {
    await page.getByRole('button', { name: 'New' }).click();
    await page.getByLabel('Title').fill('E2E Test Requirement');
    await page.getByRole('button', { name: 'Save' }).click();

    await expect(page.getByText('Requirement created')).toBeVisible();
  });

  test('passes accessibility checks', async ({ page }) => {
    const accessibilityScanResults = await new AxeBuilder({ page }).analyze();
    expect(accessibilityScanResults.violations).toEqual([]);
  });
});

Visual Regression

test('visual regression', async ({ page }) => {
  await page.goto('/dashboard');
  await expect(page).toHaveScreenshot('dashboard.png', {
    maxDiffPixels: 100,
  });
});

Consequences

Positive

Multi-Browser: Single API for all browsers
Fast: Parallel execution reduces CI time
Reliable: Auto-waiting reduces flakiness
Debugging: Trace viewer, screenshots, video
Accessibility: Built-in axe-core support

Negative

Learning Curve: Different from Cypress
Browser Binaries: Large downloads
CI Resources: Browsers need memory
Flakiness: Network-dependent tests can fail

Neutral

Selectors: Multiple selector strategies
API Testing: Can test APIs too

Implementation Status

Implementation Details

Config: frontend/playwright.config.ts
Tests: frontend/e2e/
CI: .github/workflows/e2e.yml
Scripts: npm run test:e2e

LLM Council Review

Review Date: 2025-01-16 Confidence Level: High (100%) Verdict: APPROVED WITH MODIFICATIONS

Quality Metrics

Consensus Strength Score (CSS): 0.88
Deliberation Depth Index (DDI): 0.85

Council Feedback Summary

Strong modern foundation for E2E strategy. Playwright is the correct tool choice. However, the 5-browser configuration on every PR is operationally immature and risks CI costs and flakiness.

Key Concerns Identified:

Browser Matrix Overkill: 5 browsers on every PR check significantly increases build times
Visual Regression Risk: SRE dashboards have dynamic data (timestamps, metrics) → constant failures
Static Worker Count: workers: 4 causes contention on runners with fewer vCPUs
Accessibility Depth: axe-core only catches ~30% of issues; misses keyboard navigation

Required Modifications:

Tiered Execution:
- PR Checks: Chromium only (+ one mobile if responsive is critical)
- Nightly/Merge: Full 5-browser matrix
Visual Regression Masking: Define explicit masking strategies for timestamps, charts, live data
API Mocking: Use page.route() to mock API responses for stability
Dynamic Workers: workers: process.env.CI ? '50%' : undefined
Trace on Retry: Change to trace: 'on-first-retry' to reduce storage
Sharding: Use Playwright sharding across CI nodes for scalability
Keyboard Navigation Tests: Add specific tests for focus management, not just page-load scans
Retry Strategy: Add retries: 2 for CI to handle network blips

Modifications Applied

Documented tiered execution strategy
Added visual regression masking requirement
Documented API mocking for stability
Added dynamic worker configuration
Added keyboard navigation testing requirement

Council Ranking

gpt-5.2: Best Response (tiered execution)
gemini-3-pro: Strong (visual regression)
grok-4.1: Good (performance)

References

ADR-042 | Testing Layer | Implemented

Status​

Date​

Decision Makers​

Layer​

Related ADRs​

Supersedes​

Depends On​

Context​

Decision​

Key Design Decisions​

Configuration​

Test Pattern​

Visual Regression​

Consequences​

Positive​

Negative​

Neutral​

Implementation Status​

Implementation Details​

LLM Council Review​

Quality Metrics​

Council Feedback Summary​

Key Concerns Identified:​

Required Modifications:​

Modifications Applied​

Council Ranking​

References​