SLA 違反預測與補救

AI → Human

AI 監控服務指標、預測 SLA 違反並告警維運人員提前採取行動。

5 個節點 · 5 條連接enterprise
eventagenthumanapi
視覺化
服務指標串流event

來自 APM 工具的正常運行時間、回應時間、錯誤率與吞吐量。

sequentialAI 趨勢分析
AI 趨勢分析agent

將當前趨勢投射至 SLA 閾值並預測違反時間點。

sequentialSLA 違反預測
timeout維運管理員告警
SLA 違反預測system

估算距離違反 SLA 的時間與受影響的服務層級。

conditional維運管理員告警
維運管理員告警api

PagerDuty 告警,附帶違反時程與建議緩解措施。

sequential補救計畫
補救計畫human

維運管理員審查預測、分配資源並實施修復。

uc-sla-breach-alert.osop.yaml
osop_version: "1.0"
id: "sla-breach-alert"
name:"SLA 違反預測與補救"
description:"AI 監控服務指標、預測 SLA 違反並告警維運人員提前採取行動。"

nodes:
  - id: "service_metrics"
    type: "event"
    name: "服務指標串流"
    description: "來自 APM 工具的正常運行時間、回應時間、錯誤率與吞吐量。"

  - id: "trend_analysis"
    type: "agent"
    subtype: "llm"
    name: "AI 趨勢分析"
    description: "將當前趨勢投射至 SLA 閾值並預測違反時間點。"
    security:
      risk_level: "medium"

  - id: "breach_prediction"
    type: "system"
    name: "SLA 違反預測"
    description: "估算距離違反 SLA 的時間與受影響的服務層級。"

  - id: "ops_alert"
    type: "api"
    name: "維運管理員告警"
    description: "PagerDuty 告警,附帶違反時程與建議緩解措施。"

  - id: "remediation_plan"
    type: "human"
    subtype: "review"
    name: "補救計畫"
    description: "維運管理員審查預測、分配資源並實施修復。"
    security:
      approval_gate: true

edges:
  - from: "service_metrics"
    to: "trend_analysis"
    mode: "sequential"
  - from: "trend_analysis"
    to: "breach_prediction"
    mode: "sequential"
  - from: "breach_prediction"
    to: "ops_alert"
    mode: "conditional"
    when: "hours_to_breach < 4"
  - from: "ops_alert"
    to: "remediation_plan"
    mode: "sequential"
  - from: "trend_analysis"
    to: "ops_alert"
    mode: "timeout"
    timeout_sec: 180
    label: "Escalate if analysis exceeds 3min"