An open source server and SDK. Your agent definitions compile into durable workflows — execution state lives outside your process, tool calls retry automatically, and human approvals wait and resume cleanly.
pip install agentspan Agent(tools=[...]) Your process dies. The agent doesn't.
Your process crashes. The agent keeps running on the Agentspan server. Reconnect from any machine — it resumes from the exact step, no lost work.
from agentspan.agents import (
start, AgentHandle, AgentRuntime,
)
handle = start(agent, "analyze 10k records")
# process dies 4 min in — agent lives on
# reconnect from any machine, any time
handle = AgentHandle(
workflow_id="wf-f8a2c1",
runtime=AgentRuntime(),
) # picks up right where it left off Mark any tool as requiring approval. The agent pauses, holds state on the server with no timeout, and waits. Resume from Slack, a web portal, or code.
@tool(approval_required=True)
def process_refund(order_id, amount):
"""Refund — needs human approval."""
...
handle = start(agent, "refund #8821")
# agent pauses, state held on server
# approve from Slack, web, or code
handle.approve()
# or handle.reject("over limit")
# resumes from the exact waiting point
Wire agents into pipelines with >>.
Each output feeds the next — every step logged, crash-safe, and resumable across the full chain.
from agentspan.agents import Agent, run
researcher = Agent("researcher", ...)
writer = Agent("writer", ...)
editor = Agent("editor", ...)
# three agents, one expression, durable
result = run(
researcher >> writer >> editor,
"state of AI agents in 2026",
)
# each step: logged · crash-safe Agentspan compiles agent definitions into Conductor workflows — an open-source orchestration engine that has run billions of executions in production at Netflix, LinkedIn, and Tesla. Durable state, per-step retries, full execution history, and replay are Conductor primitives. Agentspan gives you a clean Python API on top of that foundation.
Every tool call. Every LLM request. Timing on every step. Stored, queryable, replayable.
from agentspan.agents import Agent, tool, run
@tool
def search_web(query: str) -> str:
"""Search the web for information."""
...
@tool
def fetch_page(url: str) -> dict:
"""Fetch and parse a webpage."""
...
researcher = Agent(
name="researcher",
model="anthropic/claude-sonnet-4-6",
tools=[search_web, fetch_page],
instructions="Research the topic thoroughly.",
)
writer = Agent(
name="writer",
model="openai/gpt-4o",
instructions="Write a clear article from the research.",
)
result = run(researcher >> writer, "AI agents in production")
result.print_result() The pieces that matter when agents run in the real world — not a demo framework.
mock_run scripts exact tool sequences so you can assert on your logic deterministically — error handling, tool routing, output parsing — in milliseconds. No LLM needed. No server needed.
from agentspan.agents.testing import (
mock_run, MockEvent, expect,
)
result = mock_run(agent, "Weather in Chicago?",
events=[
MockEvent.tool_call("get_weather", {"city": "Chicago"}),
MockEvent.tool_result("get_weather", {"temp_f": 55}),
MockEvent.done("Chicago is 55°F and cloudy."),
])
expect(result).completed().used_tool("get_weather") Testing docs → Every pattern you'll need — sequential, parallel, handoff, router, swarm, and more. One consistent API. Mix, nest, and compose freely.
# sequential pipeline
result = run(researcher >> writer, prompt)
# parallel — runs concurrently, aggregates results
team = Agent(name="team",
agents=[analyst, researcher],
strategy=Strategy.PARALLEL)
# handoff — LLM picks the next specialist
router = Agent(name="router",
agents=[sales, support, billing],
strategy=Strategy.HANDOFF) Multi-agent docs → Use provider/model-name format. Switch from Anthropic to OpenAI to Gemini — one string, nothing else changes. Mix models within the same pipeline.
# anthropic
Agent(..., model="anthropic/claude-sonnet-4-6")
# openai
Agent(..., model="openai/gpt-4o")
# google
Agent(..., model="google/gemini-2.0-flash")
# groq
Agent(..., model="groq/llama-3.3-70b-versatile")
# one string — switch providers anywhere All providers → Validate every agent response with regex, LLM checks, or custom functions. On failure: retry automatically, ask the LLM to fix it, raise an error, or pause for a human.
from agentspan.agents import Agent, RegexGuardrail
guardrail = RegexGuardrail(
patterns=[r"\b\d{3}-\d{2}-\d{4}\b"],
name="no_ssn",
on_fail="retry",
max_retries=3,
)
agent = Agent(
name="support_bot",
model="openai/gpt-4o",
guardrails=[guardrail],
)
# auto-retries up to 3x if SSN found in output Guardrails docs → Define your output schema as a Pydantic model. Agentspan enforces it on every LLM response — retrying automatically if the model returns malformed output. Always typed.
from pydantic import BaseModel
from agentspan.agents import Agent, run
class Report(BaseModel):
summary: str
key_findings: list[str]
sources: list[str]
agent = Agent(
name="analyst",
model="openai/gpt-4o",
output_type=Report,
)
result = run(agent, "Analyze Q3 earnings") Structured output docs → ConversationMemory keeps history within a session. SemanticMemory stores and retrieves facts across sessions using similarity search. Plug in any vector backend.
from agentspan.agents import Agent, run, ConversationMemory
memory = ConversationMemory()
agent = Agent(
name="assistant",
model="openai/gpt-4o",
memory=memory,
)
run(agent, "My name is Alice, I'm a backend engineer.")
result = run(agent, "What do you know about me?")
# "You're Alice, a backend engineer." Memory docs → Stream tool calls, LLM responses, handoffs, guardrail results, and errors as events. Build real-time UIs, live approval flows, or custom log pipelines — event by event.
from agentspan.agents import stream, EventType
for event in stream(agent, "Research AI agents"):
if event.type == EventType.TOOL_CALL:
print(f"→ {event.tool_name}({event.args})")
elif event.type == EventType.TOOL_RESULT:
print(f"← {event.result}")
elif event.type == EventType.HANDOFF:
print(f"handoff → {event.target}")
elif event.type == EventType.DONE:
print(event.output)
break Streaming docs → OpenAI Agents SDK and Google ADK plug into Agentspan with one line — your agent definition stays exactly as it is. The Agentspan server adds durable execution underneath.
Your agents, handoffs, and tools stay identical. Replace Runner.run_sync with Agentspan's run.
from agentspan.agents import run
# your agent definition stays identical
# was: Runner.run_sync(agent, prompt)
result = run(agent, "prompt") See full example → Your ADK agent graph stays intact. Agentspan wraps the execution layer — persistence and observability with no restructuring.
from agentspan.agents import run
# your agent definition stays identical
# was: await runner.run_async(...)
result = run(root_agent, "prompt") See full example → Your graph, nodes, and edges stay identical. Agentspan wraps the execution layer — one line to connect.
from agentspan.agents import run
# your graph definition stays identical
# was: app.invoke({...})
result = run(app, "prompt") See full example → Machine-readable API reference at /llms.txt. Claude Code skill instructions (install, connect, test) at /skills.md.
pip install agentspan
Installs the Python SDK and the agentspan CLI.
agentspan server start
Downloads the Agentspan server on first run (~50 MB), starts it on http://localhost:6767.
from agentspan.agents import Agent, tool, run
from agentspan.agents.testing import mock_run, MockEvent, expect
@tool
def my_tool(param: str) -> str:
"""Description becomes the JSON schema."""
return "result"
agent = Agent(
name="my_agent",
model="anthropic/claude-sonnet-4-6",
tools=[my_tool],
instructions="System prompt here.",
)
result = run(agent, "User prompt here")
# test without LLM or server
result = mock_run(agent, "prompt", events=[
MockEvent.tool_call("my_tool", {"param": "x"}),
MockEvent.done("done"),
])
expect(result).completed().used_tool("my_tool") Full quickstart → The quickstart covers install, server setup, your first agent, and a working test — in five minutes.