LLM 模型評測
Human → AI對多個模型進行基準測試、比較指標,並產生建議報告。
6 個節點 · 7 條連接ml
agentclisystem
視覺化
載入評測資料集system
↓parallel→ 評測 Claude
↓parallel→ 評測 GPT-4
↓parallel→ 評測 Gemini
評測 Claudeagent
↓parallel→ 比較結果
評測 GPT-4agent
↓parallel→ 比較結果
評測 Geminiagent
↓parallel→ 比較結果
比較結果system
準確率、延遲與每 1K token 成本。
↓sequential→ 產生建議報告
產生建議報告agent
uc-model-evaluation.osop.yaml
osop_version: "1.0"
id: "model-eval"
name:"LLM 模型評測"
description:"對多個模型進行基準測試、比較指標,並產生建議報告。"
nodes:
- id: "prepare"
type: "system"
name: "載入評測資料集"
- id: "eval_claude"
type: "agent"
subtype: "llm"
name: "評測 Claude"
- id: "eval_gpt"
type: "agent"
subtype: "llm"
name: "評測 GPT-4"
- id: "eval_gemini"
type: "agent"
subtype: "llm"
name: "評測 Gemini"
- id: "compare"
type: "system"
name: "比較結果"
description: "準確率、延遲與每 1K token 成本。"
- id: "recommend"
type: "agent"
subtype: "llm"
name: "產生建議報告"
edges:
- from: "prepare"
to: "eval_claude"
mode: "parallel"
- from: "prepare"
to: "eval_gpt"
mode: "parallel"
- from: "prepare"
to: "eval_gemini"
mode: "parallel"
- from: "eval_claude"
to: "compare"
mode: "parallel"
- from: "eval_gpt"
to: "compare"
mode: "parallel"
- from: "eval_gemini"
to: "compare"
mode: "parallel"
- from: "compare"
to: "recommend"
mode: "sequential"