Model Cascade / Fallback Chain

AI ↔ AI

Try fast model first; escalate to larger model if confidence is low.

5 nodes · 5 edgesml-infra
agentsystem
Visual
Receive Queryevent

Incoming user request enters the cascade pipeline.

sequentialFast Model (Haiku)
Fast Model (Haiku)agent

Low-cost, low-latency first attempt.

sequentialConfidence Check
Confidence Checksystem

Route based on model confidence score threshold.

conditionalReturn Response
conditionalLarge Model (Opus)
Large Model (Opus)agent

High-capability fallback for complex queries.

sequentialReturn Response
Return Responseapi

Deliver final answer to the caller.

uc-model-cascade.osop.yaml
osop_version: "1.0"
id: "model-cascade"
name: "Model Cascade / Fallback Chain"
description: "Try fast model first; escalate to larger model if confidence is low."

nodes:
  - id: "receive"
    type: "event"
    name: "Receive Query"
    description: "Incoming user request enters the cascade pipeline."

  - id: "fast_model"
    type: "agent"
    subtype: "llm"
    name: "Fast Model (Haiku)"
    description: "Low-cost, low-latency first attempt."
    timeout_sec: 5

  - id: "check_confidence"
    type: "system"
    name: "Confidence Check"
    description: "Route based on model confidence score threshold."

  - id: "large_model"
    type: "agent"
    subtype: "llm"
    name: "Large Model (Opus)"
    description: "High-capability fallback for complex queries."
    timeout_sec: 30

  - id: "respond"
    type: "api"
    name: "Return Response"
    description: "Deliver final answer to the caller."

edges:
  - from: "receive"
    to: "fast_model"
    mode: "sequential"
  - from: "fast_model"
    to: "check_confidence"
    mode: "sequential"
  - from: "check_confidence"
    to: "respond"
    mode: "conditional"
    when: "confidence >= 0.8"
  - from: "check_confidence"
    to: "large_model"
    mode: "conditional"
    when: "confidence < 0.8"
  - from: "large_model"
    to: "respond"
    mode: "sequential"