🛡️ Safety at commit time
Invariants run inside the transaction and roll it back on violation. An agent can’t sign off over an open blocker, or rewrite the tests it’s judged by and approve itself. No other orchestrator does this.
v0.3 · open source · Apache-2.0
loom drives multi-step LLM agent work — code review, implementation, any review-gated task — as a replay-deterministic state machine. Safety is structural, not prompted: invariants run inside the database transaction, so an agent can’t bypass review or approve its own work. Every run leaves a complete, replayable audit trail.
npm i -g @loomfsm/pipeline $ loom run "add rate limiting to the login endpoint"
phase classify → plan → implement → review → validate → finalize
review 2 findings · 1 blocker
⏸ gate blocker: missing test for the burst path
[a]ccept [r]eject [v]iew diff
ledger every step committed · replayable · .loom/state.db Durable execution became table stakes — every framework checkpoints now. loom is built for the layer above: structural safety and a provable process, in one SQLite file you own.
Why loom
Agent frameworks help you write graphs. loom makes the run itself durable, provable, and safe to leave unattended.
Invariants run inside the transaction and roll it back on violation. An agent can’t sign off over an open blocker, or rewrite the tests it’s judged by and approve itself. No other orchestrator does this.
One timestamp token threaded through every step, atomic SQLite transactions. Replay a recorded run against a changed invariant to ask “would the new rule have caught last week’s incident?”
A policy decides each gate: human approves every step,
on-blockers asks only on a real blocker (the default),
auto runs free above a deterministic safety floor.
Same (state, timestamp, ledger) → same trajectory. Recovery is “restart and let the idempotency ledger dedup” — no half-applied steps, no double work.
Bundles (the domain) · providers (the LLM backend) · transports (the wire). The kernel has zero runtime dependencies and no vendor names — any combination is valid.
The default backend is your existing Claude Code login. Or bind any agent to OpenRouter, local Ollama, or the Anthropic API — with per-agent fallback chains.
Five ways to run
Every mode drives the identical engine, gates, and invariants — they differ only in who executes each step and how long it waits for you.
loom up
A browser console for the whole fleet — submit, watch the live agent chain, approve gates, configure backends.
loom bot telegram
Drive the fleet from your phone — submit tasks, approve gates with inline buttons, ship a finished branch. Outbound-only, default-deny.
/task …
Zero setup: your agent host executes each step, gates surface inline. No API key, no network.
loom run "…"
Drive one task to the end from a terminal, in an isolated git worktree. Your working tree is never touched.
loom daemon
Set-and-forget: parks on your gates, wakes when you answer, retries with backoff, recovers on restart, commits to a reviewable branch.
--docker
For unattended autonomy: each spawn runs in a container mounting only a dedicated clone — a real blast-radius bound.
A platform, not a single tool
The kernel is domain-blind — it knows nothing about code, reviews, or any vendor. Everything domain-specific lives in a bundle: a plugin that declares the workflow. A new domain is a new bundle — the kernel never changes.
The code bundle (review-gated implementation) ships today.
The kernel’s domain-blindness isn’t a slogan — it’s enforced: zero
runtime dependencies, no vendor or domain names in the kernel, checked
by CI greps.
Read how bundles plug in ↗
What it guarantees — honestly
loom guarantees the process: the declared review ran, nothing was bypassed, irreversible steps got a human. It does not guarantee the model’s output is correct — that’s the agents’ job. What you get is the ability to prove which process ran and see every decision behind a result.
loom is v0.3 — early and built in the open. There’s no install-count badge to flex yet: it’s used daily by its author and a handful of friends for real work on real repos.
The core — the state machine, crash recovery, the audit trail — is stable and heavily tested against a real SQLite database. APIs may still move before 1.0. If that trade suits you, you’re early in the best way.
Follow the repo ↗Bring it to your team
I’m the author of loom. If your team is putting AI agents to real work — and needs to prove what they did, for engineering discipline or for compliance — I can set that up with you.
Prefer email? teaarte@gmail.com
FAQ
Frameworks help you author agent graphs; loom makes the run itself durable. Replay-determinism (one timestamp token, atomic commits), an idempotency ledger (crash → restart → exact dedup), and invariants enforced inside the database transaction are the difference between “my graph usually works” and “I can prove what happened”.
The default backend is your Claude Code subscription — no extra API spend. With API backends, loom records tokens and real cost per spawn, and a hard total-spawn cap bounds runaway runs.
Steps run in an isolated git worktree — your working tree is never
touched. For unattended runs, --docker puts each spawn
in a container that mounts only a dedicated clone. Finished work
lands on a loom/<task> branch, reviewable, never
auto-merged.
Yes. Bind any agent to OpenRouter, local Ollama, or the Anthropic
API (loom models set implementer openrouter:deepseek/deepseek-chat),
with per-agent fallback chains. File-editing agents run through
Aider or opencode harnesses behind the same isolation seam.
Everything lives in <project>/.loom/state.db — a
plain SQLite file you own. No cloud, no telemetry. Open it with any
SQLite client and read the full audit trail.
npm i -g @loomfsm/pipeline && loom up