Solving the 84% Empty Heatmap: Historical Data Aggregation in Stentorosaur
Our 90-day status heatmap looked great in mockups. In production, it was 84% empty. Here's how we fixed it with daily summary aggregation.
The Problem: 84% Empty
Stentorosaur's status page includes a 90-day heatmap showing system health over time. Each cell represents a day, colored by uptime percentage. The component rendered beautifully—except it only showed 14 days of actual data.
Day 1 Day 14 Day 90
[██████████████] [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░]
↑ ↑
Actual data Empty (no data)
16% 84%
Root cause: The heatmap read from current.json, which contains a rolling 14-day window. Individual health check readings aren't aggregated—they're just raw timestamps and latencies.
Loading 90 days of raw JSONL archives on every page load wasn't an option:
- 90 days × 144 checks/day × 5 systems = 64,800 entries
- ~2.5MB uncompressed JSON
- 3-5 second load times on mobile
We needed aggregated daily summaries.
Solution Landscape
We evaluated four approaches:
| Option | Description | Trade-offs |
|---|---|---|
| A. Expand current.json | Keep 90 days in rolling window | File grows to 2MB+, slow loads |
| B. Build-time aggregation | Aggregate in Docusaurus plugin | Works, but requires rebuild for updates |
| C. Client-side aggregation | Fetch archives, aggregate in browser | 2MB+ download, 3-5s compute |
| D. Daily summary file | Pre-aggregate to daily-summary.json | ~15KB file, fast loads |
We chose Option D: generate a daily-summary.json file during each monitoring run.
The Daily Summary Schema
{
"version": 1,
"lastUpdated": "2026-01-01T12:00:00Z",
"windowDays": 90,
"services": {
"api": [
{
"date": "2025-12-31",
"uptimePct": 0.993,
"avgLatencyMs": 145,
"p95LatencyMs": 320,
"checksTotal": 144,
"checksPassed": 143,
"incidentCount": 1
}
],
"website": [...]
}
}
Each entry is ~100 bytes. 90 days × 5 services = ~45KB. Compression brings it under 15KB.
Key design decisions:
-
P95 latency reveals consistent slowness: Averages mask tail latency. If 10% of requests are slow (14/144 checks), P95 surfaces that degradation while averages smooth it over. For catching rare single-check failures, we also track Max latency in the raw data.
-
Incident count per day: Counting up→down transitions surfaces flapping issues that uptime percentage masks. A service with 99% uptime but 10 incidents is worse than one with 99% uptime and 1 long outage.
-
Schema versioning: The
versionfield enables future migrations without breaking old clients. -
UTC for all dates: All date keys use UTC to avoid timezone boundary confusion. A user in Tokyo and a server in Oregon both reference the same "2025-12-31" day.
The Stale Today Problem
Here's a subtle bug we almost shipped:
Daily Summary (generated at midnight UTC):
2025-12-31: uptimePct=0.993
Current time: 2025-12-31 at 6pm UTC
Today's reality: 3 more outages since summary was generated
If we only read daily-summary.json, today's data is stale. The solution: hybrid read pattern.
Hybrid Read Pattern
The useDailySummary hook fetches both files in parallel:
export function useDailySummary(options: UseDailySummaryOptions): UseDailySummaryResult {
const { baseUrl, serviceName, days = 90 } = options;
useEffect(() => {
// Fetch both files in parallel
const [summaryResponse, currentResponse] = await Promise.all([
fetch(`${baseUrl}/daily-summary.json`).catch(() => null),
fetch(`${baseUrl}/current.json`).catch(() => null),
]);
// Handle responses...
}, [baseUrl]);
// Merge: today from current.json, history from summary
const mergedData = useMemo(() => {
const today = new Date().toISOString().split('T')[0];
const entries: DailySummaryEntry[] = [];
// Aggregate today's readings from current.json
const todayReadings = groupReadingsByDate(currentData, serviceName).get(today);
if (todayReadings?.length > 0) {
entries.push(aggregateDayReadings(today, todayReadings));
}
// Add historical entries (excluding today if present)
for (const entry of historicalEntries) {
if (entry.date !== today) {
entries.push(entry);
}
}
return entries.slice(0, days);
}, [summaryData, currentData, serviceName, days]);
return { data: mergedData, loading, error, lastUpdated };
}
Key behaviors:
- Today always comes from current.json: Real-time readings, not stale aggregates
- History comes from daily-summary.json: Pre-aggregated, fast to load
- Graceful fallback: If summary fails, show 14 days from current.json only
Monitor Script Integration
The stentorosaur-monitor CLI now generates daily-summary.json after each health check.
Aggregation Logic
The core aggregation converts raw readings into a daily summary:
function aggregateDayReadings(date, readings) {
const checksTotal = readings.length;
const checksPassed = readings.filter(
r => r.state === 'up' || r.state === 'maintenance'
).length;
const uptimePct = checksTotal > 0 ? checksPassed / checksTotal : 0;
// Only include latency from successful checks
const latencies = readings
.filter(r => r.state === 'up')
.map(r => r.lat)
.sort((a, b) => a - b);
const avgLatencyMs = latencies.length > 0
? Math.round(latencies.reduce((sum, lat) => sum + lat, 0) / latencies.length)
: null;
// P95: 95th percentile (excludes top 5% outliers)
const p95LatencyMs = latencies.length > 0
? latencies[Math.ceil(latencies.length * 0.95) - 1]
: null;
// Count incidents: up→down transitions
let incidentCount = 0;
for (let i = 1; i < readings.length; i++) {
if (readings[i - 1].state === 'up' && readings[i].state === 'down') {
incidentCount++;
}
}
return { date, uptimePct, avgLatencyMs, p95LatencyMs, checksTotal, checksPassed, incidentCount };
}
Summary Generation
The main generation function reads archives and aggregates by service/date:
function generateDailySummary(archivesDir, outputDir, windowDays = 90) {
const cutoffDate = new Date();
cutoffDate.setDate(cutoffDate.getDate() - windowDays);
// Collect readings from archives
const serviceReadings = new Map();
for (const archiveFile of getArchiveFiles(archivesDir, cutoffDate)) {
const readings = parseJsonlFile(archiveFile);
for (const reading of readings) {
// Group by service and date (UTC)
const dateKey = new Date(reading.t).toISOString().split('T')[0];
const key = `${reading.svc}:${dateKey}`;
if (!serviceReadings.has(key)) {
serviceReadings.set(key, []);
}
serviceReadings.get(key).push(reading);
}
}
// Aggregate to daily summaries
const services = {};
for (const [key, readings] of serviceReadings) {
const [svc, date] = key.split(':');
if (!services[svc]) services[svc] = [];
// Sort readings by timestamp before aggregating (for incident counting)
readings.sort((a, b) => a.t - b.t);
services[svc].push(aggregateDayReadings(date, readings));
}
// Sort each service's entries by date descending
for (const svc of Object.keys(services)) {
services[svc].sort((a, b) => b.date.localeCompare(a.date));
}
const summary = {
version: 1,
lastUpdated: new Date().toISOString(),
windowDays,
services,
};
// Atomic write: temp file then rename
const tmpPath = path.join(outputDir, 'daily-summary.tmp');
const finalPath = path.join(outputDir, 'daily-summary.json');
fs.writeFileSync(tmpPath, JSON.stringify(summary, null, 2));
fs.renameSync(tmpPath, finalPath);
}
The atomic write pattern (write to .tmp, then rename) prevents serving partial files if the process crashes mid-write.
Bootstrap Script
Existing Stentorosaur users have months of archive data but no daily-summary.json. The bootstrap script backfills:
npx stentorosaur-bootstrap-summary \
--archives-dir status-data/archives \
--output-dir status-data \
--window 90
Run once during upgrade to v0.17.0. After that, the monitor workflow maintains it automatically.
Performance Results
Comparing the naive approach (loading raw archives) vs. the summary approach:
| Metric | Naive (raw archives) | Optimized (summary) |
|---|---|---|
| Data loaded | 2.5MB (90 days raw) | 15KB (summary) |
| Parse time | 3-5s (mobile) | under 50ms |
| Heatmap coverage | 90 days | 90 days |
| First contentful paint | ~4s | ~1s |
The "naive" column is what we would have shipped if we loaded raw JSONL archives client-side. Instead, we pre-aggregate server-side and ship a tiny summary file.
TypeScript Types
For consumers building on this data:
export interface DailySummaryEntry {
date: string; // "2025-12-31"
uptimePct: number; // 0.0 to 1.0
avgLatencyMs: number | null;
p95LatencyMs: number | null;
checksTotal: number;
checksPassed: number;
incidentCount: number;
}
export interface DailySummaryFile {
version: number;
lastUpdated: string; // ISO timestamp
windowDays: number;
services: Record<string, DailySummaryEntry[]>;
}
Configuration
Enable 90-day heatmaps by providing a data source URL:
// docusaurus.config.js
{
plugins: [
['@amiable-dev/docusaurus-plugin-stentorosaur', {
dataSource: {
strategy: 'github',
owner: 'your-org',
repo: 'your-repo',
branch: 'status-data',
},
// StatusItem will automatically use daily-summary.json
}],
],
}
The StatusItem component now accepts dataBaseUrl and heatmapDays props:
<StatusItem
item={systemStatus}
dataBaseUrl="/status-data" // Enables 90-day heatmap
heatmapDays={90} // Override default (14)
/>
Caching Considerations
A 15KB static JSON file is small, but caching affects freshness:
- CDN caching: If your CDN caches
daily-summary.jsonfor 24 hours, users see stale historical data. SetCache-Control: max-age=300(5 minutes) or use cache invalidation on update. - Browser caching: The hybrid pattern helps here—even if the summary is cached,
current.jsonprovides fresh "today" data. - ETag support: Consider adding
ETagheaders so clients can do conditional fetches without downloading unchanged content.
For GitHub Pages / raw.githubusercontent.com, caching is automatic with ~5 minute TTL, which works well for status pages.
Lessons Learned
-
Aggregate at write time, not read time: Moving computation from the browser to the monitoring workflow is a classic producer-side aggregation pattern. The monitoring job does the work once; every client benefits.
-
Hybrid reads solve staleness: Never trust a single data source for time-sensitive data. Merge live and historical sources to get both freshness and efficiency.
-
Percentiles reveal what averages hide: A day with 10% slow requests looks fine at avg=150ms but terrible at P95=2000ms. Track both, display the one that matters for your users.
-
Schema versioning from day one: Adding
version: 1costs nothing and saves future migration pain. -
Atomic writes prevent corruption: Write to a temp file, then rename. This prevents clients from fetching half-written JSON during the monitoring run.
Upgrade Path
Stentorosaur v0.17.0 includes daily summary aggregation. To upgrade:
- Update the plugin:
npm install @amiable-dev/docusaurus-plugin-stentorosaur@latest - Run the bootstrap script once (if you have existing data)
- Heatmaps automatically expand to 90 days
The monitoring workflow handles summary generation automatically. No workflow changes required.
Summary
The 84% empty heatmap was a data architecture problem, not a UI bug. By pre-aggregating daily summaries and implementing a hybrid read pattern, we achieved:
- Full 90-day heatmap coverage
- Sub-50ms data loading
- Real-time accuracy for today's data
- Graceful fallbacks when data is missing
The pattern applies anywhere you need historical aggregates with live updates: analytics dashboards, SLA reports, or any time-series visualization.

