LLM 模型評測

Human → AI

對多個模型進行基準測試、比較指標,並產生建議報告。

6 個節點 · 7 條連接ml
agentclisystem
視覺化
載入評測資料集system
parallel評測 Claude
parallel評測 GPT-4
parallel評測 Gemini
評測 Claudeagent
parallel比較結果
評測 GPT-4agent
parallel比較結果
評測 Geminiagent
parallel比較結果
比較結果system

準確率、延遲與每 1K token 成本。

sequential產生建議報告
產生建議報告agent
uc-model-evaluation.osop.yaml
osop_version: "1.0"
id: "model-eval"
name:"LLM 模型評測"
description:"對多個模型進行基準測試、比較指標,並產生建議報告。"

nodes:
  - id: "prepare"
    type: "system"
    name: "載入評測資料集"
  - id: "eval_claude"
    type: "agent"
    subtype: "llm"
    name: "評測 Claude"
  - id: "eval_gpt"
    type: "agent"
    subtype: "llm"
    name: "評測 GPT-4"
  - id: "eval_gemini"
    type: "agent"
    subtype: "llm"
    name: "評測 Gemini"
  - id: "compare"
    type: "system"
    name: "比較結果"
    description: "準確率、延遲與每 1K token 成本。"
  - id: "recommend"
    type: "agent"
    subtype: "llm"
    name: "產生建議報告"

edges:
  - from: "prepare"
    to: "eval_claude"
    mode: "parallel"
  - from: "prepare"
    to: "eval_gpt"
    mode: "parallel"
  - from: "prepare"
    to: "eval_gemini"
    mode: "parallel"
  - from: "eval_claude"
    to: "compare"
    mode: "parallel"
  - from: "eval_gpt"
    to: "compare"
    mode: "parallel"
  - from: "eval_gemini"
    to: "compare"
    mode: "parallel"
  - from: "compare"
    to: "recommend"
    mode: "sequential"