Why I Built OSOP: I Just Wanted to Know What the AI Did

The Frustration

Every time Claude Code finished building something, I had the same experience. The code worked. The feature was there. But I had no idea what actually happened.

Which files did it read first? What decisions did it make along the way? How many steps were involved? Did it try something, fail, and try again? The only way to find out was scrolling through a long chat log. That is not a process. That is archaeology.

And the bigger question: what is different from last time? I ran the same kind of workflow yesterday. Was today faster? Did it skip a step? Did it cost more? Without a way to compare runs, I was flying blind.

What I Actually Wanted

I wanted something very specific. The moment the AI finished, I wanted to see:

What happened — every step, in order, with timing.
What changed — a diff against the last run. What got faster, what got slower, what broke.
Who was involved — which steps were fully automated by AI, and which needed a human to decide.

And critically: it had to be readable by non-engineers. My PM should be able to look at it and understand the workflow without reading code.

So I Built osop diff

The first thing I built was not a spec or a standard. It was a diff tool. Feed it two execution records, and it shows you everything at a glance:

terminal
$ osop diff monday.osoplog.yaml tuesday.osoplog.yaml

  Workflow: feature-build
  Run A: Mon Apr 1 | Run B: Tue Apr 2

  Node             | Duration       | Cost           | Status
  ─────────────────────────────────────────────────────────────
  plan             | 2.1s → 1.8s    | $0.02 → $0.01  | same
  explore_code     | 12s → 5.2s     | $0.08 → $0.03  | same
  implement        | 45s → 32s      | $0.15 → $0.12  | same
  type_check       | 3.2s → 3.1s    | —               | same
  human_review     | 120s → 60s     | —               | same
  ─────────────────────────────────────────────────────────────
  Total            | 182s → 102s    | $0.25 → $0.16  | -44% faster

Per-step duration changes. Cost changes. Status changes. Nodes added or removed. Which steps needed AI, which needed a human. One command, one table, complete clarity.

That solved my personal problem. But then something unexpected happened.

From Personal Tool to Universal Protocol

That is when I realized: the diff tool is only useful if the execution records are in a standard format. And the records are only useful if the workflow definitions are also standardized. So I defined both.

.osop — a YAML file that describes what should happen. Nodes (what steps exist), edges (how they connect), security metadata (which steps are risky), and human gates (which steps need approval).

.osoplog — a YAML file that records what actually happened. Timestamps, durations, tool calls, AI model used, tokens consumed, human decisions, error states.

Two files. One format. Any tool can read them. Any tool can write them. And anyone — engineer or not — can diff two logs and see exactly what changed.

feature-build.osop.yaml

Plan Implementationagent

↓sequential→ Explore Codebase

Explore Codebasemcp

↓sequential→ Write Code

Write Codeagent

↓sequential→ Run Tests

Run Testscicd

↓sequential→ Human Review

↓fallback→ Write Code

Human Reviewhuman

Open in Editor

Where It Is Now

OSOP now has a full CLI with 9 commands, 87 example workflows, converters for 6 formats (GitHub Actions, Airflow, n8n, and more), a visual editor, an MCP server, and integrations with 18 AI coding platforms.

But the feature I use every single day is still osop diff. It is the simplest thing in the system, and it is the one that solves the original problem: the moment AI finishes, I know exactly what it did.

Try It

If you use AI coding agents and you have ever wondered what exactly just happened, that is the problem OSOP solves. The whole system is open source (Apache 2.0). Start with osop init to scaffold a workflow, or jump straight to osop diff with two execution logs.

The mission has not changed since day one: the moment AI finishes, you should know exactly what it did. Everything else grew from there.