v0.3 · open source · Apache-2.0

Agent runs you can prove
not just trust

loom drives multi-step LLM agent work — code review, implementation, any review-gated task — as a replay-deterministic state machine. Safety is structural, not prompted: invariants run inside the database transaction, so an agent can’t bypass review or approve its own work. Every run leaves a complete, replayable audit trail.

npm i -g @loomfsm/pipeline

Durable execution became table stakes — every framework checkpoints now. loom is built for the layer above: structural safety and a provable process, in one SQLite file you own.

Why loom

Guarantees, not vibes

Agent frameworks help you write graphs. loom makes the run itself durable, provable, and safe to leave unattended.

🛡️ Safety at commit time

Invariants run inside the transaction and roll it back on violation. An agent can’t sign off over an open blocker, or rewrite the tests it’s judged by and approve itself. No other orchestrator does this.

🔁 Replay-deterministic

One timestamp token threaded through every step, atomic SQLite transactions. Replay a recorded run against a changed invariant to ask “would the new rule have caught last week’s incident?”

🎚️ Human-in-the-loop, on a dial

A policy decides each gate: human approves every step, on-blockers asks only on a real blocker (the default), auto runs free above a deterministic safety floor.

💥 Crash-safe

Same (state, timestamp, ledger) → same trajectory. Recovery is “restart and let the idempotency ledger dedup” — no half-applied steps, no double work.

🔌 Pluggable on three axes

Bundles (the domain) · providers (the LLM backend) · transports (the wire). The kernel has zero runtime dependencies and no vendor names — any combination is valid.

🔑 No API key required

The default backend is your existing Claude Code login. Or bind any agent to OpenRouter, local Ollama, or the Anthropic API — with per-agent fallback chains.

Five ways to run

One state machine, five front-ends

Every mode drives the identical engine, gates, and invariants — they differ only in who executes each step and how long it waits for you.

loom up

🖥️ Web dashboard

A browser console for the whole fleet — submit, watch the live agent chain, approve gates, configure backends.

loom bot telegram

📱 Telegram bot

Drive the fleet from your phone — submit tasks, approve gates with inline buttons, ship a finished branch. Outbound-only, default-deny.

/task …

💬 Inside Claude Code

Zero setup: your agent host executes each step, gates surface inline. No API key, no network.

loom run "…"

⚡ Headless one-shot

Drive one task to the end from a terminal, in an isolated git worktree. Your working tree is never touched.

loom daemon

🤖 Autonomous daemon

Set-and-forget: parks on your gates, wakes when you answer, retries with backoff, recovers on restart, commits to a reviewable branch.

--docker

📦 Container isolation

For unattended autonomy: each spawn runs in a container mounting only a dedicated clone — a real blast-radius bound.

A platform, not a single tool

Code review is the first bundle,
not the point

The kernel is domain-blind — it knows nothing about code, reviews, or any vendor. Everything domain-specific lives in a bundle: a plugin that declares the workflow. A new domain is a new bundle — the kernel never changes.

A bundle declares

  • Phases & steps — the shape of the work
  • Gates & roles — where a human (or policy) decides
  • Safety invariants — rules enforced at commit time
  • Typed prompts — templates, validated, per agent

The kernel provides

  • Atomic state — every step a SQLite transaction
  • The idempotency ledger — crash-safe, no double work
  • Replay — deterministic, auditable runs
  • Gate machinery — park, wake, policies, audit trail

What a bundle could be

  • Incident-response runbooks with human sign-off
  • Research pipelines: gather → synthesize → verify
  • Content workflows: draft → edit → legal gate → publish
  • Any review-gated, multi-step LLM process

The code bundle (review-gated implementation) ships today. The kernel’s domain-blindness isn’t a slogan — it’s enforced: zero runtime dependencies, no vendor or domain names in the kernel, checked by CI greps. Read how bundles plug in ↗

What it guarantees — honestly

Prove the process,
not the model

loom guarantees the process: the declared review ran, nothing was bypassed, irreversible steps got a human. It does not guarantee the model’s output is correct — that’s the agents’ job. What you get is the ability to prove which process ran and see every decision behind a result.

Where it stands today

loom is v0.3 — early and built in the open. There’s no install-count badge to flex yet: it’s used daily by its author and a handful of friends for real work on real repos.

The core — the state machine, crash recovery, the audit trail — is stable and heavily tested against a real SQLite database. APIs may still move before 1.0. If that trade suits you, you’re early in the best way.

Follow the repo ↗

Bring it to your team

Want auditable AI agents
in your company?

I’m the author of loom. If your team is putting AI agents to real work — and needs to prove what they did, for engineering discipline or for compliance — I can set that up with you.

  • Pilot in a week — a review-gated agent pipeline running on one of your repositories, with gates where your process needs them.
  • Custom bundles — your domain encoded as phases, gates, and commit-time invariants: incident response, content pipelines, compliance workflows.
  • On-prem & audit-ready — local-first deployment, no data leaves your infrastructure, a replayable audit trail for every agent decision.

Prefer email? teaarte@gmail.com

Tell me about your use case

Usually I reply within a day. No newsletter, no spam.

FAQ

Questions, answered

How is this different from LangGraph / agent frameworks?

Frameworks help you author agent graphs; loom makes the run itself durable. Replay-determinism (one timestamp token, atomic commits), an idempotency ledger (crash → restart → exact dedup), and invariants enforced inside the database transaction are the difference between “my graph usually works” and “I can prove what happened”.

What does a run cost?

The default backend is your Claude Code subscription — no extra API spend. With API backends, loom records tokens and real cost per spawn, and a hard total-spawn cap bounds runaway runs.

Is my code safe while an agent works?

Steps run in an isolated git worktree — your working tree is never touched. For unattended runs, --docker puts each spawn in a container that mounts only a dedicated clone. Finished work lands on a loom/<task> branch, reviewable, never auto-merged.

Can I use models other than Claude?

Yes. Bind any agent to OpenRouter, local Ollama, or the Anthropic API (loom models set implementer openrouter:deepseek/deepseek-chat), with per-agent fallback chains. File-editing agents run through Aider or opencode harnesses behind the same isolation seam.

What's the data story?

Everything lives in <project>/.loom/state.db — a plain SQLite file you own. No cloud, no telemetry. Open it with any SQLite client and read the full audit trail.

Hand it a task.
Approve at the gates that matter.

npm i -g @loomfsm/pipeline && loom up