Production Incident Response
AI → HumanStructured incident handling with AI triage and human oversight.
5 nodes · 5 edgesdevops
eventagenthumancli
Visual
Alert Triggeredevent
PagerDuty/Grafana alert fires.
↓sequential→ AI Triage
AI Triageagent
AI analyzes logs, metrics, and recent deploys.
↓sequential→ Engineer Investigation
↓timeout→ Engineer Investigation
Engineer Investigationhuman
On-call engineer validates AI triage.
↓conditional→ Apply Mitigation
Apply Mitigationcli
Rollback, scale up, or apply hotfix.
↓sequential→ Generate Postmortem
Generate Postmortemagent
AI drafts postmortem from .osoplog data.
uc-incident-response.osop.yaml
osop_version: "1.0"
id: "incident-response"
name: "Production Incident Response"
description: "Structured incident handling with AI triage and human oversight."
nodes:
- id: "alert"
type: "event"
name: "Alert Triggered"
description: "PagerDuty/Grafana alert fires."
- id: "triage"
type: "agent"
subtype: "llm"
name: "AI Triage"
description: "AI analyzes logs, metrics, and recent deploys."
security:
risk_level: "medium"
- id: "investigate"
type: "human"
subtype: "input"
name: "Engineer Investigation"
description: "On-call engineer validates AI triage."
- id: "mitigate"
type: "cli"
subtype: "script"
name: "Apply Mitigation"
description: "Rollback, scale up, or apply hotfix."
security:
risk_level: "high"
approval_gate: true
- id: "postmortem"
type: "agent"
subtype: "llm"
name: "Generate Postmortem"
description: "AI drafts postmortem from .osoplog data."
edges:
- from: "alert"
to: "triage"
mode: "sequential"
- from: "triage"
to: "investigate"
mode: "sequential"
- from: "investigate"
to: "mitigate"
mode: "conditional"
when: "investigation.confirmed == true"
- from: "mitigate"
to: "postmortem"
mode: "sequential"
- from: "triage"
to: "investigate"
mode: "timeout"
timeout_sec: 300
label: "Escalate if >5min"