Agent tracing is the practice of recording each step an agent takes as a span — a timed, labelled unit of work such as a model call, a tool call, or a sub-agent step. Spans carry their inputs, outputs, and timing, and nest into a parent-child tree, producing a trace that shows exactly what the agent did, in what order, and how long each part took.
Why tracing matters
An agent run is a sequence of model decisions and tool calls whose path is chosen at runtime, so two runs of the same agent can behave differently. When a run produces a wrong answer, loops, or stalls, the question is always the same: what happened, step by step? Without a structured record that question is hard to answer, because the only evidence is scattered log lines and the final output.
Tracing turns that opaque run into an inspectable artifact. Each step becomes a span you can open to see the prompt that was sent, the result that came back, and the time it took. Because spans are linked, you can see that a particular tool returned bad data three steps before the agent went off course, or that most of a run’s latency sat in one slow call. That visibility is the difference between guessing at a failure and locating it.
How it works
Tracing instruments the reasoning loop so that every meaningful action opens and closes a span:
- When the run starts, a root span is created to represent the whole execution.
- Each model call, tool call, and sub-agent handoff opens a child span under the step that triggered it, capturing its inputs.
- When the action finishes, the span is closed with its outputs, duration, and status.
- Parent-child links assemble the spans into a tree that mirrors the structure of the run.
- For work that crosses processes or agents, a shared trace identifier is propagated so distant spans are stitched into the same trace.
The result is a queryable record: you can filter for failed spans, sort by duration, or walk the tree from the final answer back to the decision that produced it.
Tracing vs. logging
Logging emits independent messages at chosen points; a trace records the structure of execution itself. A log tells you that an event occurred, while a span tells you when it started and ended, what it consumed and produced, and where it sits in the run. Logs remain useful for free-form detail, but only a trace reconstructs the path of a non-deterministic agent without manual effort.
In practice
A durable, observable runtime emits a span for each step as the run executes, so the trace is available live and persists after the run ends. Because the same state that drives durable execution backs the trace, you can inspect a run’s reasoning loop while it is still in flight and replay it afterward. Tracing is the raw material that broader observability and evaluation are built on. For watching spans arrive in real time, see observing a run.