Alpha Morph is an early research prototype. Expect bugs, rough edges, and breaking changes. Help us build it →
Alpha · v0.48.0 · Research prototype

Git for AI-assisted development

When an agent writes half your code, git only sees the file diff. It can't tell you which prompt produced a function, whether the tests still pass, or whether a clean merge just silently regressed something both branches got right. Morph records every agent session on the commit, attaches test results to every commit, and refuses merges that make the code worse.

Install on macOS $brew tap r/morph $brew install morph Installs both morph and morph-mcp. Other platforms →
Try the alpha Walk the tutorial Help build it GitHub

Open research prototype, built in public. Read the thesis or help build it.

The Problem

Git doesn't know how your code got here

When an agent writes your code — in Cursor, Claude Code, OpenCode, or across a fleet of agents under Agent of Empires — Git snapshots the file tree and tracks line-level diffs. But it has no idea how those files arrived — and it can't tell you whether the result still works.

?

No link from code to prompt

Which prompt produced this refactor? Which conversation led to this bug fix? Git has nowhere to store any of that.

···

Agent sessions disappear

The agent may try three approaches before settling on one. Tool calls, file reads, shell commands, token usage — none of it survives the commit.

↑↓

"Did it get better?" isn't in the diff

You can't read a diff and know whether the tests still pass, the benchmarks improved, or the agent regressed an edge case. You have to run it.

>_

Mixed authorship isn't tracked

Which agent, which model, which prompt, under which environment — all of it matters for reviewing and reproducing AI-authored changes.

×

Merge can silently regress behavior

Two agent branches can merge cleanly at the text level while the resulting code fails tests that both parents passed.

Probabilistic outputs break assumptions

Git assumes identity is byte equality and reproducibility is identical output. Neither holds when an LLM is in the loop.

Version the transformation, not just the output

Same content-addressed Merkle DAG as Git. Same hash-of-contents identity. Three additions that we believe make version control work for agent-authored code. Morph runs alongside Git (separate .morph/ and .git/) — drop Git later if you want to. The ideas below are where we're heading; the implementation is partial and actively being built.

Runs and Traces — permanent agent receipts

Every agent session is recorded as an immutable Run with a full Trace: every prompt, response, tool call, file read, file edit, shell command, and token count. IDE hooks parse the agent's complete transcript into structured events, so recording is always-on and doesn't depend on the agent calling a tool.

What a Trace contains run        c81f2a…
actor      agent:cursor / model:claude-opus-4
env        {os: darwin, rust: 1.82}

// Trace events
prompt     "fix the retry loop"
tool_call  read(src/retry.rs)
tool_call  edit(src/retry.rs, +12/-4)
shell      cargo test —— retry
response   "fixed; tests pass."

"Doesn't Claude already record this on disk? Doesn't Langfuse?" Yes — and no. The on-disk transcripts and the LLM observability platforms each solve a different problem. Only morph traces live inside the version control DAG, addressable from the commit they produced. Read the full argument in SESSION-TRACKING.md.

Property a reviewer needs Claude / Cursor / OpenCode
(on-disk transcripts)
Langfuse / Phoenix / OTEL
(LLM observability)
Morph traces
Linked from a specific commit No — correlated only by timestamp, if at all No — indexed by app/session, not by VCS commit Yescommit.evidence_refs
Content-addressed (immutable) No — files can be rotated, edited, deleted No — span IDs are random Yes — hash of canonical bytes
Visible to teammates No — lives only on the developer's laptop Yes — via the SaaS dashboard (data egress required) Yes — via opt-in morph remote, local-first
Same shape across agent tools No — Cursor, Claude, OpenCode all differ Partial — OTLP-shaped, but app-defined fields vary Yes — one Run+Trace schema per repo
Merge-aware No No — no notion of "merge" Yes — case provenance via morph merge-plan
Local-first / offline Yes No — ships to a hosted backend Yes

Behavioral commits — evidence, not vibes

A Morph commit stores a file tree hash and a behavioral contract: which pipeline was run, which evaluation suite, what scores were observed, and in which environment. Fresh repos start relaxed so you can commit immediately; opt into tests_total and tests_passed enforcement with morph policy init when ready. Tell Morph your test suite once with morph config commit.test_command "cargo test --workspace" (or pytest, vitest, jest, go), then plain morph commit -m "…" runs it, parses the metrics, and attaches them automatically.

A Morph commit vs. a Git commit // Git commit
tree        a3f2c…
parent      b71e0…
message     "fix retry"

// Morph commit
tree        a3f2c…
parent      b71e0…
message     "fix retry"
pipeline    d4e8a…
eval_suite  f92b1…
metrics     {tests_passed: 545, pass_rate: 1.0}
evidence    [run:c81f, trace:e02a]

Merge by behavioral dominance

Instead of three-way text merge alone, morph merge requires the candidate to dominate both parents' certified metrics: at least as good on every declared metric. If the merged code regresses on anything, the merge fails. morph merge-plan previews the bar to beat before you merge.

Merge with evidence // At merge time, Morph records:

parent_1_scores  {pass_rate: 0.94, p95_ms: 340}
parent_2_scores  {pass_rate: 0.91, p95_ms: 280}
bar_to_beat     {pass_rate: 0.94, p95_ms: 280}
merged_scores    {pass_rate: 0.95, p95_ms: 275}

// Dominates on both. Merge accepted.

Agent Integrations

One command to wire up your agent

Morph ships with an MCP server (morph-mcp) and setup commands that install the right config, hooks, and rules for your IDE — or your multi-agent session manager. Hooks parse the full agent transcript into structured Trace events, so every prompt, tool call, and file edit is recorded — you don't depend on the agent remembering to record.

Cursor

MCP server, hooks for always-on recording, and rules for behavioral commits. Writes into .cursor/.

morph setup cursor

Cursor setup →

Claude Code

MCP server and hooks for Anthropic's coding agent. Records tool calls, file edits, and shell invocations.

morph setup claude-code

Claude Code setup →

OpenCode

MCP config, AGENTS.md, and a recording plugin — one command, fully-traced sessions.

morph setup opencode

OpenCode setup →

Agent of Empires

Multi-agent session manager that drives Claude Code, OpenCode, Cursor CLI, and others through tmux + Docker sandboxes. Morph wraps every session with lifecycle hooks: a commit on create, a Run + Trace on every launch, a final commit on destroy. AoE on GitHub →

morph setup aoe

Agent of Empires setup →

Zero to recording in three commands

Install the two binaries (morph and morph-mcp), initialize a Morph repo in your project, and wire up your IDE. Morph runs side-by-side with Git — commits in one are independent of commits in the other. Heads-up: this is alpha software. Some commands are half-built, the on-disk format may change (use morph upgrade to migrate), and we'd love to hear what breaks — file an issue.

# 1. Install the binaries (macOS via Homebrew — recommended)
$ brew tap r/morph
$ brew install morph  # installs both `morph` and `morph-mcp`

# … or build from source (any platform with Rust)
$ git clone https://github.com/r/morph.git && cd morph
$ cargo install --path morph-cli && cargo install --path morph-mcp

# 2. Initialize in your project
$ cd /path/to/your/project
$ morph init

# 3. Wire up your agent (pick one)
$ morph setup cursor
$ morph setup claude-code
$ morph setup opencode
$ morph setup aoe  # Agent of Empires — multi-agent session manager

New here? Walk through the ~20-minute getting-started tutorial →

How It Works

Git-shaped CLI, richer objects

Morph mirrors Git where possible: if you know Git, you know Morph. The CLI adds commands for recording agent sessions, certifying commits against policy, and inspecting traces.

# Standard Git-shaped workflow (init is relaxed by default;
# tighten with `morph policy init` when ready)
$ morph init
$ morph config commit.test_command "cargo test --workspace" # once
$ morph add .
$ morph commit -m "fix retry loop"   # runs tests, attaches metrics
$ morph log # history with metrics
$ morph diff main feature

# Eval-driven workflow: spec-first cases for case-provenance at merge
$ morph eval add specs/login.yaml  # YAML or Cucumber
$ morph eval show               # inspect the registered suite
$ morph eval gaps               # report unaddressed evidence gaps

# Branching and behavioral merge
$ morph branch feature
$ morph checkout feature
$ morph merge-plan main  # preview bar to beat + case provenance
$ morph merge main       # dominance required

# Inspect recorded agent work
$ morph inspect summary     # overview of recorded runs
$ morph inspect show <run>   # grouped steps (prompts, tools, files)
$ morph inspect target <ref> # the code the agent was working on
$ morph inspect artifact <ref> # what the agent produced

# Policy, certification, gating
$ morph policy require-metrics tests_passed pass_rate
$ morph certify --metrics-file metrics.json
$ morph gate                # exit 1 if HEAD fails policy

# Team inspection (hosted browser UI + JSON API)
$ morph serve               # http://127.0.0.1:8765

Privacy & Sharing

What morph records, what crosses the wire

Morph records everything — that's the design point. Reviewability, replay, attribution, prompt-as-spec, and merge-aware behavioral context all depend on it. The tradeoff is that traces contain whatever happened in your agent session, and you should know exactly what crosses the wire when, before you let any of it leave your laptop.

git push code only

→ git remote (GitHub / GitLab / self-hosted)

.morph/ is in .git/info/exclude automatically. The git push physically cannot include runs, traces, prompts, or model responses. Teammates pulling git see ordinary git commits and a clean working tree.

$ git push origin main
// no .morph/, no traces, no prompts

morph push opt-in, separate

→ morph remote (independently configured)

A morph remote is a separate channel with separate access control. Default install configures none. When you do push, you're sending the prompts, responses, file contents the agent read, shell stdout/stderr, and model parameters — verbatim.

$ morph remote add team ssh://team-host/morph/repo
$ morph push team main

The team-sharing model in one line: code goes through your existing git remote; behavioral history goes through a separate morph remote that only people you'd trust to read your IDE history can pull from. Set them up explicitly. Neither channel is silent.

morph forget — the secret-leak escape hatch

A trace caught a credential or PII you didn't intend to record? morph forget <hash> permanently retires the offending Run, Trace, or prompt blob from your local store and writes an immutable Tombstone object recording the actor / reason / timestamp. Pass --remote team and the next morph push team ships the tombstone; the teammate's next morph fetch applies it automatically. The merge gate treats any evidence_ref that resolves to a tombstone as "no claim" rather than a hard error, so retroactively forgetting evidence does not retroactively break commits.

$ morph forget <run-hash> --remote team --reason "leaked db password"
$ morph push team main
// teammates: morph fetch team — tombstone applied silently

Forget refuses to retire commits, blobs (other than prompts), trees, pipelines, eval suites, or annotations — those carry structural meaning the DAG depends on. It also refuses whole-object-only: there is no partial-redaction story. Already-fetched copies on teammates' laptops stay where they are until that teammate fetches the tombstone. Full design in SECURITY.md.

Things morph does not yet do

Stated up front, so you don't discover them by reading the source:

The full plain-language privacy story — what's in .morph/, what's in a trace, the team-setup checklist, and the "I leaked a secret, what do I do" recipe — is in docs/SECURITY.md. Read it before you push to a morph remote.

What Morph assumes

Morph is built on a small set of axioms. Violate any of them and something breaks.

01

Content-addressed, immutable objects. Every object is identified by a hash of its contents. History cannot be tampered with.

02

Evidence does not rewrite history. Runs, traces, and evaluation results never mutate prior commits. New evidence produces new objects.

03

Pipeline steps compose cleanly. Sequential chaining and parallel execution are well-defined — like Promise.then() and Promise.all().

04

Evaluation suites are explicit contracts. "Better" is never implicit. Metrics, directions, fixture sources, and aggregation methods are all versioned and hashed.

05

Scores are partially ordered. One scorecard dominates another only if it wins on every metric. If A is more accurate but slower, they're incomparable.

06

Merge records scores from both parents. Every merge commit records what both parents achieved and what the merged code achieved.

07

Environment is part of the record. Model version, sampling settings, toolchain — without this, scores from different environments aren't comparable.

08

Reproducibility means re-running the checks. You can't get identical outputs from an LLM. Reproducibility means re-running the evaluation and getting consistent aggregate scores.

Where We Are

An honest status report — and an invitation

We think the problem is real and the thesis is right. The implementation is genuinely alpha: some of it works well, some of it is held together with duct tape, and a lot of it needs your eyes, your bug reports, and your PRs. Here's where things stand today.

What works today Solid

  • One-command setup for Cursor, Claude Code, OpenCode, and the Agent of Empires multi-agent session manager (morph setup <name>)
  • Recording prompts and responses as immutable Runs + Traces, always-on via hooks
  • Core Git-shaped workflow: init, add, commit, log, diff, branch, checkout, merge, tag, stash, revert
  • Behavioral merge with dominance check and merge-plan preview (incl. case provenance)
  • Policy with required_metrics gate, certify, and gate for pass/fail enforcement; relaxed default on morph init — tighten with morph policy init when ready
  • Eval-driven workflow: morph eval add, rebuild, show, run, from-output, record, gaps — ingest YAML/Cucumber specs, parse cargo/pytest/vitest/jest/go output, fail commits without metrics
  • morph serve: local browser UI + JSON API for inspecting commits, runs, traces
  • 1,100+ unit/CLI tests and 37 end-to-end Cucumber scenarios across 16 features

What's rough WIP

  • Structured trace events (tool calls, file edits, shell invocations) are captured inconsistently across IDEs — coverage is improving but uneven
  • Eval-suite ingestion handles YAML and Cucumber out of the box; richer expectation DSLs and per-case scoring are still in progress
  • Storage: filesystem only; SQLite and real remote backends are designed but not implemented
  • On-disk format has already changed twice; expect more morph upgrade migrations before v1.0
  • Remotes are local-path only; no hosted Morph forge yet
  • Windows support is untested — we develop on macOS and Linux
  • Docs and CLI help are catching up to the code; some commands are under-documented

Where you can help Invitation

  • Try it in a real project, break it, and file an issue — especially if recording misses events
  • Add trace adapters for other agents (Aider, Cline, Zed, Codex CLI, …)
  • Build real evaluation suites and share what shape they want to take
  • Implement a real remote backend (HTTP, S3, or a hosted forge)
  • Sharpen the theory: read THEORY.md and push back on where it's wrong
  • Improve docs, write tutorials, record screencasts
  • Port to Windows & test on uncommon setups

Ready to jump in?

We're a small research project and every issue, PR, and conversation moves this forward. Star the repo to follow along, or grab an open issue and start hacking.

Read the theory and the spec

The formal model — pipelines as monadic computations, certificate vectors, the merge monotonicity theorem — plus a concrete v0 system design with object schemas and CLI reference.

Morph: Version Control for AI-Assisted Development

Raffi Krikorian · Mozilla

Theory v0 Spec Paper (LaTeX)