MIT · Self-hostable

Durable Agents.

An open source server and SDK. Your agent definitions compile into durable workflows — execution state lives outside your process, tool calls retry automatically, and human approvals wait and resume cleanly.

Quickstart — 5 min → Documentation GitHub ↗

$ pip install agentspan

your code Agent(tools=[...])

compiles to →

durable workflow on Agentspan server

Your process dies. The agent doesn't.

what makes it click

Try these once. You won't ship agents any other way.

crash + resume

Your process dies. The agent doesn't.

Your process crashes. The agent keeps running on the Agentspan server. Reconnect from any machine — it resumes from the exact step, no lost work.

from agentspan.agents import (
    start, AgentHandle, AgentRuntime,
)

handle = start(agent, "analyze 10k records")
# process dies 4 min in — agent lives on

# reconnect from any machine, any time
handle = AgentHandle(
    workflow_id="wf-f8a2c1",
    runtime=AgentRuntime(),
)   # picks up right where it left off

Crash + resume example →

human-in-the-loop

One decorator. Agents that wait.

Mark any tool as requiring approval. The agent pauses, holds state on the server with no timeout, and waits. Resume from Slack, a web portal, or code.

@tool(approval_required=True)
def process_refund(order_id, amount):
    """Refund — needs human approval."""
    ...

handle = start(agent, "refund #8821")
# agent pauses, state held on server

# approve from Slack, web, or code
handle.approve()
# or handle.reject("over limit")
# resumes from the exact waiting point

Full HITL example →

agent pipelines

Multi-agent in one expression.

Wire agents into pipelines with >>. Each output feeds the next — every step logged, crash-safe, and resumable across the full chain.

from agentspan.agents import Agent, run

researcher = Agent("researcher", ...)
writer     = Agent("writer", ...)
editor     = Agent("editor", ...)

# three agents, one expression, durable
result = run(
    researcher >> writer >> editor,
    "state of AI agents in 2026",
)
# each step: logged · crash-safe

All 8 coordination strategies →

the execution engine

Agentspan compiles agent definitions into Conductor workflows — an open-source orchestration engine that has run billions of executions in production at Netflix, LinkedIn, and Tesla. Durable state, per-step retries, full execution history, and replay are Conductor primitives. Agentspan gives you a clean Python API on top of that foundation.

trusted by teams at

observability

See inside every run.

Every tool call. Every LLM request. Timing on every step. Stored, queryable, replayable.

research_agent.py │ $ python research_agent.py

from agentspan.agents import Agent, tool, run

@tool
def search_web(query: str) -> str:
    """Search the web for information."""
    ...

@tool
def fetch_page(url: str) -> dict:
    """Fetch and parse a webpage."""
    ...

researcher = Agent(
    name="researcher",
    model="anthropic/claude-sonnet-4-6",
    tools=[search_web, fetch_page],
    instructions="Research the topic thoroughly.",
)

writer = Agent(
    name="writer",
    model="openai/gpt-4o",
    instructions="Write a clear article from the research.",
)

result = run(researcher >> writer, "AI agents in production")
result.print_result()

starting researcher

─────────────────────────────────────────

llm →anthropic/claude-sonnet-4-6

tool →search_web

{"query": "AI agents in production 2026"}

tool ←"LangGraph 1.0 GA, Google ADK launch..."67ms

tool →fetch_page

{"url": "arxiv.org/abs/2603.09214"}

tool ←{abstract: "Durable execution for agents..."}203ms

llm ←done2.1s

starting writer

─────────────────────────────────────────

llm →openai/gpt-4o

llm ←done1.4s

✓ completed 3.8s · 2 agents · 2 tool calls · 4,821 tokens

# query execution history from CLI

$agentspan agent execution --name researcher --since 1h

exec-f8a2c1 COMPLETED 3.8s 4,821 tokens
exec-a3d911 COMPLETED 4.1s 5,103 tokens
exec-71bc44 FAILED 0.3s —

See full example — research pipeline →

built for production

The infrastructure your agents are missing.

The pieces that matter when agents run in the real world — not a demo framework.

testing

Agent tests that belong in CI.

mock_run scripts exact tool sequences so you can assert on your logic deterministically — error handling, tool routing, output parsing — in milliseconds. No LLM needed. No server needed.

from agentspan.agents.testing import (
    mock_run, MockEvent, expect,
)

result = mock_run(agent, "Weather in Chicago?",
    events=[
        MockEvent.tool_call("get_weather", {"city": "Chicago"}),
        MockEvent.tool_result("get_weather", {"temp_f": 55}),
        MockEvent.done("Chicago is 55°F and cloudy."),
    ])

expect(result).completed().used_tool("get_weather")

Testing docs →

multi-agent

All 8 coordination strategies.

Every pattern you'll need — sequential, parallel, handoff, router, swarm, and more. One consistent API. Mix, nest, and compose freely.

# sequential pipeline
result = run(researcher >> writer, prompt)

# parallel — runs concurrently, aggregates results
team = Agent(name="team",
    agents=[analyst, researcher],
    strategy=Strategy.PARALLEL)

# handoff — LLM picks the next specialist
router = Agent(name="router",
    agents=[sales, support, billing],
    strategy=Strategy.HANDOFF)

Multi-agent docs →

any llm

Any model. One string change.

Use provider/model-name format. Switch from Anthropic to OpenAI to Gemini — one string, nothing else changes. Mix models within the same pipeline.

# anthropic
Agent(..., model="anthropic/claude-sonnet-4-6")

# openai
Agent(..., model="openai/gpt-4o")

# google
Agent(..., model="google/gemini-2.0-flash")

# groq
Agent(..., model="groq/llama-3.3-70b-versatile")
# one string — switch providers anywhere

All providers →

guardrails

Input and output safety. Built in.

Validate every agent response with regex, LLM checks, or custom functions. On failure: retry automatically, ask the LLM to fix it, raise an error, or pause for a human.

from agentspan.agents import Agent, RegexGuardrail

guardrail = RegexGuardrail(
    patterns=[r"\b\d{3}-\d{2}-\d{4}\b"],
    name="no_ssn",
    on_fail="retry",
    max_retries=3,
)
agent = Agent(
    name="support_bot",
    model="openai/gpt-4o",
    guardrails=[guardrail],
)
# auto-retries up to 3x if SSN found in output

Guardrails docs →

structured output

Pass a Pydantic model. Get one back.

Define your output schema as a Pydantic model. Agentspan enforces it on every LLM response — retrying automatically if the model returns malformed output. Always typed.

from pydantic import BaseModel
from agentspan.agents import Agent, run

class Report(BaseModel):
    summary: str
    key_findings: list[str]
    sources: list[str]

agent = Agent(
    name="analyst",
    model="openai/gpt-4o",
    output_type=Report,
)
result = run(agent, "Analyze Q3 earnings")

Structured output docs →

memory

Stateful conversations. Cross-session recall.

ConversationMemory keeps history within a session. SemanticMemory stores and retrieves facts across sessions using similarity search. Plug in any vector backend.

from agentspan.agents import Agent, run, ConversationMemory

memory = ConversationMemory()
agent = Agent(
    name="assistant",
    model="openai/gpt-4o",
    memory=memory,
)

run(agent, "My name is Alice, I'm a backend engineer.")
result = run(agent, "What do you know about me?")
# "You're Alice, a backend engineer."

Memory docs →

streaming

See every step as it happens.

Stream tool calls, LLM responses, handoffs, guardrail results, and errors as events. Build real-time UIs, live approval flows, or custom log pipelines — event by event.

from agentspan.agents import stream, EventType

for event in stream(agent, "Research AI agents"):
    if event.type == EventType.TOOL_CALL:
        print(f"→ {event.tool_name}({event.args})")
    elif event.type == EventType.TOOL_RESULT:
        print(f"← {event.result}")
    elif event.type == EventType.HANDOFF:
        print(f"handoff → {event.target}")
    elif event.type == EventType.DONE:
        print(event.output)
        break

Streaming docs →

framework integrations

Already using a framework?

OpenAI Agents SDK and Google ADK plug into Agentspan with one line — your agent definition stays exactly as it is. The Agentspan server adds durable execution underneath.

OpenAI Agents SDK working

Your agents, handoffs, and tools stay identical. Replace Runner.run_sync with Agentspan's run.

from agentspan.agents import run
# your agent definition stays identical

# was: Runner.run_sync(agent, prompt)
result = run(agent, "prompt")

See full example →

Google ADK working

Your ADK agent graph stays intact. Agentspan wraps the execution layer — persistence and observability with no restructuring.

from agentspan.agents import run
# your agent definition stays identical

# was: await runner.run_async(...)
result = run(root_agent, "prompt")

See full example →

LangGraph working

Your graph, nodes, and edges stay identical. Agentspan wraps the execution layer — one line to connect.

from agentspan.agents import run
# your graph definition stays identical

# was: app.invoke({...})
result = run(app, "prompt")

See full example →

for AI coding agents

Built to work with Claude Code, Cursor, Codex, and other AI coding agents.

/llms.txt ↗ /skills.md ↗

Machine-readable API reference at /llms.txt. Claude Code skill instructions (install, connect, test) at /skills.md.

install

pip install agentspan

Installs the Python SDK and the agentspan CLI.

start the server

agentspan server start

Downloads the Agentspan server on first run (~50 MB), starts it on http://localhost:6767.

key resources

agentspan.ai/llms.txt — full API reference for AI agents

agentspan.ai/skills.md — Claude Code skill

agentspan.ai/docs/quickstart — step-by-step setup

minimal working pattern

from agentspan.agents import Agent, tool, run
from agentspan.agents.testing import mock_run, MockEvent, expect

@tool
def my_tool(param: str) -> str:
    """Description becomes the JSON schema."""
    return "result"

agent = Agent(
    name="my_agent",
    model="anthropic/claude-sonnet-4-6",
    tools=[my_tool],
    instructions="System prompt here.",
)

result = run(agent, "User prompt here")

# test without LLM or server
result = mock_run(agent, "prompt", events=[
    MockEvent.tool_call("my_tool", {"param": "x"}),
    MockEvent.done("done"),
])
expect(result).completed().used_tool("my_tool")

Full quickstart →

Ready to build agents that hold.

The quickstart covers install, server setup, your first agent, and a working test — in five minutes.

Read the quickstart → View on GitHub → Documentation →