Skip to main content

Roadmap: Automating Incident Communication with LLMs

· 5 min read
Chris
Amiable Dev
Claude
AI Assistant

This post outlines planned automation features for Stentorosaur—specifically, using LLMs to draft incident summaries and statistical methods to detect degradation. These are design proposals, not shipped features.

Scope and Constraints

Before diving in, let's be clear about what Stentorosaur can and cannot access:

What we have access to:

  • GitHub Issues (incident text, comments, labels, timestamps)
  • Committed JSON data (response times, uptime history)
  • GitHub Actions environment (CI context, environment variables)

What we do NOT have access to:

  • Your server logs (unless you explicitly pipe them somewhere)
  • APM tools (Datadog, New Relic, etc.)
  • Internal metrics (CPU, memory, database queries)

All automation features work within these boundaries. We're not building a log aggregation platform—we're automating the communication layer on top of your existing status workflow.

Feature 1: Incident Summary Drafts

The problem: During incidents, engineers update GitHub Issues with terse, inconsistent notes:

Title: API down
Body: 503s. investigating.

Hours later, someone needs to write a proper post-mortem. They dig through issue comments, Slack threads, and deploy logs to reconstruct the timeline.

The solution: A GitHub Action that parses the issue timeline and drafts a structured summary.

How it works

GitHub Issue (opened/labeled)
→ GitHub Action triggered
→ Parse issue body + comments
→ Send to LLM (OpenAI/Anthropic/Ollama)
→ LLM returns structured summary
→ Action posts summary as PR or issue comment
→ Human reviews, edits, merges

Input: GitHub Issue text and comments only. No server logs, no metrics integration.

Output: A draft summary in this format:

## Incident Summary (Draft)

**Impact:** Authentication endpoints returning 503 errors
**Duration:** 14:32 - 15:10 UTC (38 minutes)
**Affected Systems:** API Gateway, OAuth service

**Timeline:**
- 14:32: First report in issue
- 14:45: "Scaled replicas" mentioned in comments
- 15:10: Issue closed

**Resolution:** Horizontal scaling applied

---
*Generated from GitHub Issue #123. Review before publishing.*

Implementation

# .github/workflows/incident-summary.yml
name: Draft Incident Summary
on:
issues:
types: [closed]

jobs:
summarize:
if: contains(github.event.issue.labels.*.name, 'status')
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Generate summary
run: |
npx stentorosaur-summarize \
--issue ${{ github.event.issue.number }} \
--model ollama/llama3 # Or: openai/gpt-4
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} # Optional

Configuration

{
"summarize": {
"enabled": true,
"provider": "ollama",
"model": "llama3",
"output": "comment",
"template": "root-cause-impact-resolution"
}
}

Provider options:

  • ollama — Local inference, no data leaves your CI runner
  • openai — OpenAI API (requires OPENAI_API_KEY)
  • anthropic — Claude API (requires ANTHROPIC_API_KEY)

Limitations

  • LLMs hallucinate. The output is a draft, not a source of truth.
  • Context window limits. Very long issue threads may be truncated.
  • No external data. If root cause details aren't in the issue, the LLM can't infer them.

Feature 2: Statistical Degradation Detection

Note: This is not "AI" in the neural network sense. It's basic statistics applied to your uptime data.

The problem: Traditional monitoring alerts on binary up/down. Gradual degradation (response times creeping from 100ms to 500ms) goes unnoticed until users complain.

The solution: Apply statistical thresholds to your existing response time data.

How it works

// Pseudo-code
function detectDegradation(history: ResponseTime[]): Alert | null {
const baseline = mean(history.slice(-7 * 24)); // 7-day average
const stdDev = standardDeviation(history.slice(-7 * 24));
const current = history[history.length - 1];

if (current > baseline + 3 * stdDev) {
return {
severity: 'minor',
message: `Response time ${current}ms exceeds 3σ threshold (baseline: ${baseline}ms)`,
};
}
return null;
}

This runs during your regular monitoring workflow. If degradation is detected, it creates a minor severity issue.

Configuration

{
"degradationDetection": {
"enabled": true,
"threshold": "3-sigma",
"lookbackDays": 7,
"autoCreateIssue": true,
"minDataPoints": 100
}
}

What this is NOT

  • Not machine learning
  • Not anomaly detection in the "AI" sense
  • Not predictive

It's a simple statistical baseline. If current response time exceeds 3 standard deviations from the 7-day mean, we flag it. This is intentionally boring and predictable.

What We're NOT Building

To be explicit, these features are out of scope:

FeatureWhy Not
Log ingestionWe're a status page plugin, not Splunk
Predictive maintenanceRequires internal telemetry we can't access
Auto-publishing updatesToo risky; humans must review
APM integrationScope creep; use dedicated tools

Human-in-the-Loop

All automation features follow this principle:

  1. AI generates drafts, not final content
  2. Output is a PR or comment, not a published update
  3. Human reviews and merges
  4. AI-generated content is labeled in the UI

We will never auto-publish AI-generated content to your status page.

Privacy

Local-first: Ollama support means incident data stays in your CI runner. Nothing leaves your infrastructure.

API option: If you prefer GPT-4/Claude quality, you provide the API key. We don't proxy through our servers.

No telemetry: Stentorosaur doesn't phone home with your incident data.

Timeline

PhaseTargetScope
Phase 1Q1 2025Incident summary drafts (GitHub Issues → LLM → comment/PR)
Phase 2Q2 2025Statistical degradation detection (3-sigma thresholds)
FutureTBDNatural language search over incident history

These timelines assume the features are scoped as described. If scope expands, timelines slip.

Feedback

This is a design proposal. Before building, we want to validate:

  • Is incident summarization useful to your workflow?
  • Would you use local inference (Ollama) or API-based?
  • Are there privacy concerns we haven't addressed?

Discussion: GitHub Discussions