Multi-Model Essay Grading
AI ↔ AIThree LLMs grade in parallel, scores are aggregated, bias is checked, final grade is issued.
7 nodes · 9 edgeseducation
agentsystem
Visual
Essay Intakesystem
Receive submitted essay with rubric, anonymize student identity.
↓parallel→ Grader A (GPT-4o)
↓parallel→ Grader B (Claude)
↓parallel→ Grader C (Gemini)
Grader A (GPT-4o)agent
Score essay on structure, argument quality, evidence use, and writing clarity.
↓parallel→ Score Aggregation
Grader B (Claude)agent
Score essay independently using identical rubric and blind to other graders.
↓parallel→ Score Aggregation
Grader C (Gemini)agent
Score essay independently as third grader for robust consensus.
↓parallel→ Score Aggregation
Score Aggregationsystem
Compute weighted average, flag if any grader deviates more than 15% from mean.
↓sequential→ Bias Detection Agent
Bias Detection Agentagent
Analyze score patterns for demographic bias, topic bias, or length bias.
↓conditional→ Final Grade & Feedback
↓fallback→ Grader A (GPT-4o)
Final Grade & Feedbackagent
Issue final grade with synthesized feedback from all graders and improvement suggestions.
uc-multi-model-grading.osop.yaml
osop_version: "1.0"
id: "multi-model-grading"
name: "Multi-Model Essay Grading"
description: "Three LLMs grade in parallel, scores are aggregated, bias is checked, final grade is issued."
nodes:
- id: "essay_intake"
type: "system"
name: "Essay Intake"
description: "Receive submitted essay with rubric, anonymize student identity."
- id: "grader_1"
type: "agent"
subtype: "llm"
name: "Grader A (GPT-4o)"
description: "Score essay on structure, argument quality, evidence use, and writing clarity."
- id: "grader_2"
type: "agent"
subtype: "llm"
name: "Grader B (Claude)"
description: "Score essay independently using identical rubric and blind to other graders."
- id: "grader_3"
type: "agent"
subtype: "llm"
name: "Grader C (Gemini)"
description: "Score essay independently as third grader for robust consensus."
- id: "aggregate"
type: "system"
name: "Score Aggregation"
description: "Compute weighted average, flag if any grader deviates more than 15% from mean."
- id: "bias_check"
type: "agent"
subtype: "llm"
name: "Bias Detection Agent"
description: "Analyze score patterns for demographic bias, topic bias, or length bias."
- id: "final_grade"
type: "agent"
subtype: "llm"
name: "Final Grade & Feedback"
description: "Issue final grade with synthesized feedback from all graders and improvement suggestions."
edges:
- from: "essay_intake"
to: "grader_1"
mode: "parallel"
- from: "essay_intake"
to: "grader_2"
mode: "parallel"
- from: "essay_intake"
to: "grader_3"
mode: "parallel"
- from: "grader_1"
to: "aggregate"
mode: "parallel"
- from: "grader_2"
to: "aggregate"
mode: "parallel"
- from: "grader_3"
to: "aggregate"
mode: "parallel"
- from: "aggregate"
to: "bias_check"
mode: "sequential"
- from: "bias_check"
to: "final_grade"
mode: "conditional"
when: "bias.detected == false"
- from: "bias_check"
to: "grader_1"
mode: "fallback"
label: "Bias detected, re-grade with adjusted prompts"