What if your AI agent could watch itself work, then get better at it? That is the idea behind osop synthesize. You give it a stack of .osoplog execution records — the detailed trace of what an AI agent actually did across multiple sessions — and it produces an optimized .osop workflow that captures the best patterns while eliminating the waste.
We ran it on 5 real Claude Code session logs: 85 total nodes, 1,666 minutes of recorded execution time. The synthesizer distilled them into a single 13-node workflow that captures everything the agent learned across those sessions.
The Command
The CLI is straightforward. Point it at your logs, optionally provide a base workflow to improve, and set a goal:
$ osop synthesize \
session-01.osoplog.yaml \
session-02.osoplog.yaml \
session-03.osoplog.yaml \
session-04.osoplog.yaml \
session-05.osoplog.yaml \
--base original-workflow.osop \
--goal "optimize for speed" \
-o optimized-workflow.osopThe synthesizer reads every node record, every edge, every duration, every failure and retry. It builds a statistical model of what the agent actually does versus what the workflow says it should do — then it writes a new workflow that matches reality.
What We Fed It
We collected 5 session logs from real development work — feature implementations, bug fixes, and refactoring tasks. The raw numbers:
- 85 nodes across all sessions — the full execution graph of every tool call, every decision, every retry.
- 1,666 minutes of total recorded execution time spanning multiple days of development.
- 23 unique node types from file reads and edits to test runs, git operations, and human review gates.
What Came Out
The synthesizer produced a 13-node workflow. Not a lossy compression — a distillation. It identified the recurring patterns across all 5 sessions and encoded them as a reusable workflow with proper edge types (sequential, parallel, fallback).
Read requirements, check existing patterns, draft approach.
Search for related files and dependencies in parallel.
Find test patterns and fixtures to reuse.
Write code following discovered patterns.
Run tsc --noEmit immediately after edits.
Add tests matching existing conventions.
Execute full test suite.
Check for security issues, breaking changes, edge cases.
Auto-fix lint and formatting issues.
Commit with descriptive message.
Ensure all checks pass before requesting review.
Open PR with summary and test plan.
Developer reviews the PR.
Patterns the AI Discovered
The most interesting part is what the synthesizer learned from watching itself. Four patterns emerged consistently:
- Plan first. Every successful session started with a planning step that read requirements and checked existing patterns before writing any code. Sessions that skipped planning had 3x more fallback loops.
- Explore in parallel. The best sessions ran code exploration and test exploration simultaneously. The synthesized workflow encodes this as parallel edges from the planning node.
- Verify immediately. Type-checking right after implementation — not at the end — caught errors when context was fresh. The synthesized workflow adds a fallback edge from typecheck back to implement.
- Risk before ship. A dedicated risk review step before committing caught security issues and breaking changes that tests alone missed. The synthesizer placed it after tests pass but before any git operations.
The Self-Optimizing Loop
This is where it gets powerful. The synthesized .osop workflow is not a one-time artifact. It becomes the base workflow for future sessions. Those sessions produce new .osoplog records. Feed them back into synthesize, and the workflow improves again.
Run, log, synthesize, repeat. Each cycle tightens the workflow based on real execution data. The agent literally gets better at its job by watching itself do it.
How It Works Under the Hood
The synthesizer performs three passes over the input logs:
- Frequency analysis — Which nodes appear in every session? Which are one-off anomalies? High-frequency nodes become required steps; rare ones are pruned.
- Dependency graph extraction — What ordering constraints are real? If step B always follows step A across all sessions, that is a sequential edge. If B sometimes runs alongside C, that is a parallel edge.
- Duration and failure analysis — Which steps are bottlenecks? Which have fallback patterns (fail then retry)? The synthesizer encodes retry loops as fallback edges and flags slow steps for potential parallelization.
Try It Yourself
If you have been using OSOP logging (via Claude Code instructions, the MCP server, or the Python CLI), you already have the .osoplog files you need. Just point synthesize at them.
The feature ships in osop CLI v0.9.0. Install with pip install osop and run osop synthesize --help to get started.
Source code: The synthesizer implementation lives in the osop repo under osop/commands/synthesize.py. Contributions welcome.