生產事故應變

AI → Human

結構化的事故處理流程,結合 AI 分類與人工監督。

5 個節點 · 5 條連接devops
eventagenthumancli
視覺化
告警觸發event

PagerDuty 或 Grafana 告警觸發。

sequentialAI 分類診斷
AI 分類診斷agent

AI 分析日誌、指標與近期部署紀錄。

sequential工程師調查
timeout工程師調查
工程師調查human

值班工程師驗證 AI 的分類結果。

conditional套用緩解措施
套用緩解措施cli

執行回滾、擴容或套用 hotfix。

sequential產生事後檢討報告
產生事後檢討報告agent

AI 根據 .osoplog 資料撰寫事後檢討報告。

uc-incident-response.osop.yaml
osop_version: "1.0"
id: "incident-response"
name:"生產事故應變"
description:"結構化的事故處理流程,結合 AI 分類與人工監督。"

nodes:
  - id: "alert"
    type: "event"
    name: "告警觸發"
    description: "PagerDuty 或 Grafana 告警觸發。"

  - id: "triage"
    type: "agent"
    subtype: "llm"
    name: "AI 分類診斷"
    description: "AI 分析日誌、指標與近期部署紀錄。"
    security:
      risk_level: "medium"

  - id: "investigate"
    type: "human"
    subtype: "input"
    name: "工程師調查"
    description: "值班工程師驗證 AI 的分類結果。"

  - id: "mitigate"
    type: "cli"
    subtype: "script"
    name: "套用緩解措施"
    description: "執行回滾、擴容或套用 hotfix。"
    security:
      risk_level: "high"
      approval_gate: true

  - id: "postmortem"
    type: "agent"
    subtype: "llm"
    name: "產生事後檢討報告"
    description: "AI 根據 .osoplog 資料撰寫事後檢討報告。"

edges:
  - from: "alert"
    to: "triage"
    mode: "sequential"
  - from: "triage"
    to: "investigate"
    mode: "sequential"
  - from: "investigate"
    to: "mitigate"
    mode: "conditional"
    when: "investigation.confirmed == true"
  - from: "mitigate"
    to: "postmortem"
    mode: "sequential"
  - from: "triage"
    to: "investigate"
    mode: "timeout"
    timeout_sec: 300
    label: "Escalate if >5min"