loom is an open-source orchestrator for multi-step LLM agent work. It drives agents through classify → plan → implement → review → validate → finalize as a replay-deterministic state machine, committing every step atomically to a local SQLite database, with human approval gates and safety invariants enforced at commit time.

How is loom different from LangGraph or other agent frameworks?

loom is built around durability and auditability rather than graph authoring: one timestamp token threaded through every step makes runs replay-deterministic, an idempotency ledger makes crash recovery exact (restart and dedup, no double work), and invariants run inside the database transaction so unsafe transitions are rolled back, not just discouraged by a prompt.

Do I need an API key to use loom?

No. The zero-config default runs through your existing Claude Code login (your subscription). You can also configure OpenRouter, Ollama (local), or the Anthropic API per agent, with fallback chains.

Is loom production-ready?

loom is at v0.3 — early but used daily by its author for real work. The state machine, crash recovery, and audit trail are stable and well-tested; APIs may still change before 1.0.

Where does loom store its data?

In a plain SQLite file at /.loom/state.db that you own. Every spawn, finding, verdict, and gate decision is recorded there — open the database and see exactly what happened.

Can I get help integrating loom into my company?

Yes. The author offers integration consulting: a pilot agent pipeline on one of your repositories, custom bundles for your domain (incident response, content, compliance workflows), and on-prem audit-ready deployment. Contact: teaarte@gmail.com or the form at loomfsm.dev/#contact.

v0.3 · open source · Apache-2.0

Agent runs you can prove —
not just trust

loom drives multi-step LLM agent work — code review, implementation, any review-gated task — as a replay-deterministic state machine. Safety is structural, not prompted: invariants run inside the database transaction, so an agent can’t bypass review or approve its own work. Every run leaves a complete, replayable audit trail.

Get started Star on GitHub ↗

npm i -g @loomfsm/pipeline

~/your-project

$ loom run "add rate limiting to the login endpoint"

  phase  classify → plan → implement → review → validate → finalize

  review   2 findings · 1 blocker
  ⏸ gate    blocker: missing test for the burst path
           [a]ccept  [r]eject  [v]iew diff

  ledger   every step committed · replayable · .loom/state.db

Durable execution became table stakes — every framework checkpoints now. loom is built for the layer above: structural safety and a provable process, in one SQLite file you own.

Why loom

Guarantees, not vibes

Agent frameworks help you write graphs. loom makes the run itself durable, provable, and safe to leave unattended.

🛡️ Safety at commit time

Invariants run inside the transaction and roll it back on violation. An agent can’t sign off over an open blocker, or rewrite the tests it’s judged by and approve itself. No other orchestrator does this.

🔁 Replay-deterministic

One timestamp token threaded through every step, atomic SQLite transactions. Replay a recorded run against a changed invariant to ask “would the new rule have caught last week’s incident?”

🎚️ Human-in-the-loop, on a dial

A policy decides each gate: human approves every step, on-blockers asks only on a real blocker (the default), auto runs free above a deterministic safety floor.

💥 Crash-safe

Same (state, timestamp, ledger) → same trajectory. Recovery is “restart and let the idempotency ledger dedup” — no half-applied steps, no double work.

🔌 Pluggable on three axes

Bundles (the domain) · providers (the LLM backend) · transports (the wire). The kernel has zero runtime dependencies and no vendor names — any combination is valid.

🔑 No API key required

The default backend is your existing Claude Code login. Or bind any agent to OpenRouter, local Ollama, or the Anthropic API — with per-agent fallback chains.

Five ways to run

One state machine, five front-ends

Every mode drives the identical engine, gates, and invariants — they differ only in who executes each step and how long it waits for you.

loom up

🖥️ Web dashboard

A browser console for the whole fleet — submit, watch the live agent chain, approve gates, configure backends.

loom bot telegram

📱 Telegram bot

Drive the fleet from your phone — submit tasks, approve gates with inline buttons, ship a finished branch. Outbound-only, default-deny.

/task …

💬 Inside Claude Code

Zero setup: your agent host executes each step, gates surface inline. No API key, no network.

loom run "…"

⚡ Headless one-shot

Drive one task to the end from a terminal, in an isolated git worktree. Your working tree is never touched.

loom daemon

🤖 Autonomous daemon

Set-and-forget: parks on your gates, wakes when you answer, retries with backoff, recovers on restart, commits to a reviewable branch.

--docker

📦 Container isolation

For unattended autonomy: each spawn runs in a container mounting only a dedicated clone — a real blast-radius bound.

A platform, not a single tool

Code review is the first bundle,
not the point

The kernel is domain-blind — it knows nothing about code, reviews, or any vendor. Everything domain-specific lives in a bundle: a plugin that declares the workflow. A new domain is a new bundle — the kernel never changes.

A bundle declares

Phases & steps — the shape of the work
Gates & roles — where a human (or policy) decides
Safety invariants — rules enforced at commit time
Typed prompts — templates, validated, per agent

The kernel provides

Atomic state — every step a SQLite transaction
The idempotency ledger — crash-safe, no double work
Replay — deterministic, auditable runs
Gate machinery — park, wake, policies, audit trail

What a bundle could be

Incident-response runbooks with human sign-off
Research pipelines: gather → synthesize → verify
Content workflows: draft → edit → legal gate → publish
Any review-gated, multi-step LLM process

The code bundle (review-gated implementation) ships today. The kernel’s domain-blindness isn’t a slogan — it’s enforced: zero runtime dependencies, no vendor or domain names in the kernel, checked by CI greps. Read how bundles plug in ↗

What it guarantees — honestly

Prove the process,
not the model

loom guarantees the process: the declared review ran, nothing was bypassed, irreversible steps got a human. It does not guarantee the model’s output is correct — that’s the agents’ job. What you get is the ability to prove which process ran and see every decision behind a result.

Where it stands today

loom is v0.3 — early and built in the open. There’s no install-count badge to flex yet: it’s used daily by its author and a handful of friends for real work on real repos.

The core — the state machine, crash recovery, the audit trail — is stable and heavily tested against a real SQLite database. APIs may still move before 1.0. If that trade suits you, you’re early in the best way.

Follow the repo ↗

Bring it to your team

Want auditable AI agents
in your company?

I’m the author of loom. If your team is putting AI agents to real work — and needs to prove what they did, for engineering discipline or for compliance — I can set that up with you.

Pilot in a week — a review-gated agent pipeline running on one of your repositories, with gates where your process needs them.
Custom bundles — your domain encoded as phases, gates, and commit-time invariants: incident response, content pipelines, compliance workflows.
On-prem & audit-ready — local-first deployment, no data leaves your infrastructure, a replayable audit trail for every agent decision.

Prefer email? teaarte@gmail.com

FAQ

Questions, answered

How is this different from LangGraph / agent frameworks?

Frameworks help you author agent graphs; loom makes the run itself durable. Replay-determinism (one timestamp token, atomic commits), an idempotency ledger (crash → restart → exact dedup), and invariants enforced inside the database transaction are the difference between “my graph usually works” and “I can prove what happened”.

What does a run cost?

The default backend is your Claude Code subscription — no extra API spend. With API backends, loom records tokens and real cost per spawn, and a hard total-spawn cap bounds runaway runs.

Is my code safe while an agent works?

Steps run in an isolated git worktree — your working tree is never touched. For unattended runs, --docker puts each spawn in a container that mounts only a dedicated clone. Finished work lands on a loom/<task> branch, reviewable, never auto-merged.

Can I use models other than Claude?

Yes. Bind any agent to OpenRouter, local Ollama, or the Anthropic API (loom models set implementer openrouter:deepseek/deepseek-chat), with per-agent fallback chains. File-editing agents run through Aider or opencode harnesses behind the same isolation seam.

What's the data story?

Everything lives in <project>/.loom/state.db — a plain SQLite file you own. No cloud, no telemetry. Open it with any SQLite client and read the full audit trail.

Hand it a task.
Approve at the gates that matter.

npm i -g @loomfsm/pipeline && loom up

Read the quickstart Why loom, in depth

Agent runs you can prove —
not just trust

Guarantees, not vibes

🛡️ Safety at commit time

🔁 Replay-deterministic

🎚️ Human-in-the-loop, on a dial

💥 Crash-safe

🔌 Pluggable on three axes

🔑 No API key required

One state machine, five front-ends

🖥️ Web dashboard

📱 Telegram bot

💬 Inside Claude Code

⚡ Headless one-shot

🤖 Autonomous daemon

📦 Container isolation

Code review is the first bundle,
not the point

A bundle declares

The kernel provides

What a bundle could be

Prove the process,
not the model

Where it stands today

Want auditable AI agents
in your company?

Tell me about your use case

Message sent

Questions, answered

Hand it a task.
Approve at the gates that matter.

Agent runs you can prove — not just trust

Guarantees, not vibes

🛡️ Safety at commit time

🔁 Replay-deterministic

🎚️ Human-in-the-loop, on a dial

💥 Crash-safe

🔌 Pluggable on three axes

🔑 No API key required

One state machine, five front-ends

🖥️ Web dashboard

📱 Telegram bot

💬 Inside Claude Code

⚡ Headless one-shot

🤖 Autonomous daemon

📦 Container isolation

Code review is the first bundle,not the point

A bundle declares

The kernel provides

What a bundle could be

Prove the process,not the model

Where it stands today

Want auditable AI agentsin your company?

Tell me about your use case

Message sent

Questions, answered

Hand it a task.Approve at the gates that matter.

Agent runs you can prove —
not just trust

Code review is the first bundle,
not the point

Prove the process,
not the model

Want auditable AI agents
in your company?

Hand it a task.
Approve at the gates that matter.