Alpha
Morph is an early research prototype. Expect bugs, rough edges, and breaking changes.
Help us build it →
Alpha · v0.48.0 · Research prototype
Git for AI-assisted development
When an agent writes half your code, git only sees the file diff. It can't tell you
which prompt produced a function, whether the tests still pass, or whether a clean
merge just silently regressed something both branches got right.
Morph records every agent session on the commit,
attaches test results to every commit, and refuses merges that make the code worse.
Install on macOS$brew tap r/morph$brew install morphInstalls both morph and morph-mcp. Other platforms →
When an agent writes your code — in Cursor, Claude Code, OpenCode, or
across a fleet of agents under Agent of Empires — Git snapshots the file
tree and tracks line-level diffs. But it has no idea how those files
arrived — and it can't tell you whether the result still works.
?
No link from code to prompt
Which prompt produced this refactor? Which conversation led to this bug fix? Git has nowhere to store any of that.
···
Agent sessions disappear
The agent may try three approaches before settling on one. Tool calls, file reads, shell commands, token usage — none of it survives the commit.
↑↓
"Did it get better?" isn't in the diff
You can't read a diff and know whether the tests still pass, the benchmarks improved, or the agent regressed an edge case. You have to run it.
>_
Mixed authorship isn't tracked
Which agent, which model, which prompt, under which environment — all of it matters for reviewing and reproducing AI-authored changes.
×
Merge can silently regress behavior
Two agent branches can merge cleanly at the text level while the resulting code fails tests that both parents passed.
≈
Probabilistic outputs break assumptions
Git assumes identity is byte equality and reproducibility is identical output. Neither holds when an LLM is in the loop.
Our Thesis
Version the transformation, not just the output
Same content-addressed Merkle DAG as Git. Same hash-of-contents identity.
Three additions that we believe make version control work for agent-authored code.
Morph runs alongside Git (separate .morph/ and .git/) —
drop Git later if you want to. The ideas below are where we're heading; the
implementation is partial and actively being built.
Runs and Traces — permanent agent receipts
Every agent session is recorded as an immutable Run with
a full Trace: every prompt, response, tool call, file read,
file edit, shell command, and token count. IDE hooks parse the agent's
complete transcript into structured events, so recording is always-on and
doesn't depend on the agent calling a tool.
What a Trace containsrunc81f2a… actoragent:cursor / model:claude-opus-4 env{os: darwin, rust: 1.82}
// Trace events prompt"fix the retry loop" tool_callread(src/retry.rs) tool_calledit(src/retry.rs, +12/-4) shellcargo test —— retry response"fixed; tests pass."
"Doesn't Claude already record this on disk? Doesn't Langfuse?" Yes — and no.
The on-disk transcripts and the LLM observability platforms each solve a different problem.
Only morph traces live inside the version control DAG, addressable from the commit they
produced. Read the full argument in
SESSION-TRACKING.md.
Property a reviewer needs
Claude / Cursor / OpenCode (on-disk transcripts)
Langfuse / Phoenix / OTEL (LLM observability)
Morph traces
Linked from a specific commit
No — correlated only by timestamp, if at all
No — indexed by app/session, not by VCS commit
Yes — commit.evidence_refs
Content-addressed (immutable)
No — files can be rotated, edited, deleted
No — span IDs are random
Yes — hash of canonical bytes
Visible to teammates
No — lives only on the developer's laptop
Yes — via the SaaS dashboard (data egress required)
Yes — via opt-in morph remote, local-first
Same shape across agent tools
No — Cursor, Claude, OpenCode all differ
Partial — OTLP-shaped, but app-defined fields vary
Yes — one Run+Trace schema per repo
Merge-aware
No
No — no notion of "merge"
Yes — case provenance via morph merge-plan
Local-first / offline
Yes
No — ships to a hosted backend
Yes
Behavioral commits — evidence, not vibes
A Morph commit stores a file tree hash and a behavioral
contract: which pipeline was run, which evaluation suite, what scores were
observed, and in which environment. Fresh repos start relaxed so you can
commit immediately; opt into tests_total and tests_passed
enforcement with morph policy init when ready. Tell Morph
your test suite once with
morph config commit.test_command "cargo test --workspace"
(or pytest, vitest, jest, go), then plain
morph commit -m "…" runs it, parses the metrics,
and attaches them automatically.
A Morph commit vs. a Git commit// Git commit treea3f2c… parentb71e0… message"fix retry"
Instead of three-way text merge alone, morph merge requires the
candidate to dominate both parents' certified metrics: at least as good on
every declared metric. If the merged code regresses on anything,
the merge fails. morph merge-plan previews the bar to beat
before you merge.
Merge with evidence// At merge time, Morph records:
Morph ships with an MCP server (morph-mcp) and setup commands that
install the right config, hooks, and rules for your IDE — or your
multi-agent session manager. Hooks parse the full agent transcript into
structured Trace events, so every prompt, tool call, and file edit is recorded
— you don't depend on the agent remembering to record.
Cursor
MCP server, hooks for always-on recording, and rules for behavioral commits. Writes into .cursor/.
Multi-agent session manager that drives Claude Code, OpenCode, Cursor CLI, and others through tmux + Docker sandboxes. Morph wraps every session with lifecycle hooks: a commit on create, a Run + Trace on every launch, a final commit on destroy. AoE on GitHub →
Install the two binaries (morph and morph-mcp), initialize
a Morph repo in your project, and wire up your IDE. Morph runs side-by-side with
Git — commits in one are independent of commits in the other.
Heads-up: this is alpha software.
Some commands are half-built, the on-disk format may change (use
morph upgrade to migrate), and we'd love to hear what breaks —
file an issue.
# 1. Install the binaries (macOS via Homebrew — recommended) $ brew tap r/morph $ brew install morph# installs both `morph` and `morph-mcp`
# … or build from source (any platform with Rust) $ git clone https://github.com/r/morph.git && cd morph $ cargo install --path morph-cli && cargo install --path morph-mcp
# 2. Initialize in your project $ cd /path/to/your/project $ morph init
Morph mirrors Git where possible: if you know Git, you know Morph. The CLI adds
commands for recording agent sessions, certifying commits against policy, and
inspecting traces.
# Standard Git-shaped workflow (init is relaxed by default; # tighten with `morph policy init` when ready) $ morph init $ morph config commit.test_command "cargo test --workspace"# once $ morph add . $ morph commit -m"fix retry loop"# runs tests, attaches metrics $ morph log# history with metrics $ morph diff mainfeature
# Eval-driven workflow: spec-first cases for case-provenance at merge $ morph eval add specs/login.yaml# YAML or Cucumber $ morph eval show# inspect the registered suite $ morph eval gaps# report unaddressed evidence gaps
# Branching and behavioral merge $ morph branch feature $ morph checkout feature $ morph merge-plan main# preview bar to beat + case provenance $ morph merge main# dominance required
# Inspect recorded agent work $ morph inspect summary# overview of recorded runs $ morph inspect show <run># grouped steps (prompts, tools, files) $ morph inspect target <ref># the code the agent was working on $ morph inspect artifact <ref># what the agent produced
Morph records everything — that's the design point.
Reviewability, replay, attribution, prompt-as-spec, and
merge-aware behavioral context all depend on it. The tradeoff
is that traces contain whatever happened in your agent
session, and you should know exactly what crosses the wire
when, before you let any of it leave your laptop.
git push code only
→ git remote (GitHub / GitLab / self-hosted)
.morph/ is in .git/info/exclude
automatically. The git push physically cannot
include runs, traces, prompts, or model responses. Teammates
pulling git see ordinary git commits and a clean working
tree.
$ git push origin main // no .morph/, no traces, no prompts
morph push opt-in, separate
→ morph remote (independently configured)
A morph remote is a separate channel with separate
access control. Default install configures none. When you
do push, you're sending the prompts, responses, file
contents the agent read, shell stdout/stderr, and model
parameters — verbatim.
$ morph remote add team ssh://team-host/morph/repo
$ morph push team main
The team-sharing model in one line: code goes
through your existing git remote; behavioral history goes
through a separate morph remote that only people you'd
trust to read your IDE history can pull from. Set them
up explicitly. Neither channel is silent.
morph forget — the secret-leak escape hatch
A trace caught a credential or PII you didn't intend to record?
morph forget <hash> permanently retires the
offending Run, Trace, or prompt blob from your
local store and writes an immutable Tombstone object
recording the actor / reason / timestamp. Pass
--remote team and the next
morph push team ships the tombstone; the
teammate's next morph fetch applies it
automatically. The merge gate treats any
evidence_ref that resolves to a tombstone as
"no claim" rather than a hard error, so retroactively
forgetting evidence does not retroactively break commits.
$ morph forget <run-hash> --remote team --reason "leaked db password"
$ morph push team main // teammates: morph fetch team — tombstone applied silently
Forget refuses to retire commits, blobs (other than prompts),
trees, pipelines, eval suites, or annotations — those
carry structural meaning the DAG depends on. It also refuses
whole-object-only: there is no partial-redaction story.
Already-fetched copies on teammates' laptops stay
where they are until that teammate fetches the tombstone.
Full design in SECURITY.md.
Things morph does not yet do
Stated up front, so you don't discover them by reading the source:
No client-side redaction filter on
morph push. Today push is
"send everything reachable from this commit." A
redaction-on-push hook is roadmap.
No selective fetch.morph fetch team pulls the full DAG.
No encryption at rest in .morph/.
Same posture as every agent tool's on-disk transcripts
(Claude / Cursor / OpenCode all unencrypted on disk by
default). Use disk encryption.
No automatic secret scanning. Morph
does not look at trace bytes for tokens, API keys, or PII
before recording or pushing. Use the agent-level
guardrails your IDE provides.
SSH transport does not yet carry tombstones.morph forget works end-to-end across filesystem
morph remotes today; the SSH remote-helper protocol upgrade
is on the roadmap. Until then, ssh into an SSH-served
remote and run morph forget there too.
The full plain-language privacy story — what's in
.morph/, what's in a trace, the team-setup
checklist, and the "I leaked a secret, what do I do" recipe
— is in
docs/SECURITY.md.
Read it before you push to a morph remote.
Design Principles
What Morph assumes
Morph is built on a small set of axioms. Violate any of them and something breaks.
01
Content-addressed, immutable objects. Every object is identified by a hash of its contents. History cannot be tampered with.
02
Evidence does not rewrite history. Runs, traces, and evaluation results never mutate prior commits. New evidence produces new objects.
03
Pipeline steps compose cleanly. Sequential chaining and parallel execution are well-defined — like Promise.then() and Promise.all().
04
Evaluation suites are explicit contracts. "Better" is never implicit. Metrics, directions, fixture sources, and aggregation methods are all versioned and hashed.
05
Scores are partially ordered. One scorecard dominates another only if it wins on every metric. If A is more accurate but slower, they're incomparable.
06
Merge records scores from both parents. Every merge commit records what both parents achieved and what the merged code achieved.
07
Environment is part of the record. Model version, sampling settings, toolchain — without this, scores from different environments aren't comparable.
08
Reproducibility means re-running the checks. You can't get identical outputs from an LLM. Reproducibility means re-running the evaluation and getting consistent aggregate scores.
Where We Are
An honest status report — and an invitation
We think the problem is real and the thesis is right. The implementation is
genuinely alpha: some of it works well, some of it is held together with
duct tape, and a lot of it needs your eyes, your bug reports, and your PRs.
Here's where things stand today.
What works today Solid
One-command setup for Cursor, Claude Code, OpenCode, and the Agent of Empires multi-agent session manager (morph setup <name>)
Recording prompts and responses as immutable Runs + Traces, always-on via hooks
Behavioral merge with dominance check and merge-plan preview (incl. case provenance)
Policy with required_metrics gate, certify, and gate for pass/fail enforcement; relaxed default on morph init — tighten with morph policy init when ready
morph serve: local browser UI + JSON API for inspecting commits, runs, traces
1,100+ unit/CLI tests and 37 end-to-end Cucumber scenarios across 16 features
What's rough WIP
Structured trace events (tool calls, file edits, shell invocations) are captured inconsistently across IDEs — coverage is improving but uneven
Eval-suite ingestion handles YAML and Cucumber out of the box; richer expectation DSLs and per-case scoring are still in progress
Storage: filesystem only; SQLite and real remote backends are designed but not implemented
On-disk format has already changed twice; expect more morph upgrade migrations before v1.0
Remotes are local-path only; no hosted Morph forge yet
Windows support is untested — we develop on macOS and Linux
Docs and CLI help are catching up to the code; some commands are under-documented
Where you can help Invitation
Try it in a real project, break it, and file an issue — especially if recording misses events
Add trace adapters for other agents (Aider, Cline, Zed, Codex CLI, …)
Build real evaluation suites and share what shape they want to take
Implement a real remote backend (HTTP, S3, or a hosted forge)
Sharpen the theory: read THEORY.md and push back on where it's wrong
Improve docs, write tutorials, record screencasts
Port to Windows & test on uncommon setups
Ready to jump in?
We're a small research project and every issue, PR, and conversation moves
this forward. Star the repo to follow along, or grab an open issue and start
hacking.
The formal model — pipelines as monadic computations, certificate vectors,
the merge monotonicity theorem — plus a concrete v0 system design with
object schemas and CLI reference.
Morph: Version Control for AI-Assisted Development