Durability, observability & control

What is Agent State Management?

Also called: agent state, state management

Updated June 24, 2026
Quick Definition

State management is how an agent’s execution position and working data are stored and kept current as a run proceeds. The state is everything needed to know where a run is and what it has done — the steps taken, the tool results gathered, the current branch, and anything it is waiting on — so the agent can always continue from a correct, well-defined point.

Why state management matters

Every step an agent takes depends on what came before: the next model call needs the conversation so far, a branch decision needs earlier tool results, and resuming after a pause needs to know what the run was waiting for. That accumulated context is the run’s state, and where it is kept determines how robust the agent is.

The common shortcut is to hold state only in process memory. That works until the process stops — a crash, a deploy, a reclaimed instance — at which point the state, and therefore the run, disappears. There is no correct position to return to, so the only option is to start over, repeating model calls and tool side effects that already happened. Sound state management is what lets a run be paused, observed, moved to another machine, and resumed without losing its place.

How it works

State management gives a run a single authoritative record of its position, updated as it advances:

  1. The run begins with an initial state — its inputs and an empty history.
  2. As each step completes, its outcome is written into the state: a tool result is stored, the current step advances, a variable is set, or a pending approval is recorded.
  3. The next step reads the current state to decide what to do, so decisions always reflect everything that has happened so far.
  4. When the run pauses or fails, the last saved state is the exact point from which it can resume.

Keeping this record outside the process is what makes it survivable: any healthy worker can read the saved state and continue, and any observer can read it to see where the run stands.

State vs. memory

State and memory are easy to conflate but answer different questions. State is the execution position of a particular run — which step, which results, what it is waiting for — and it is specific to that run. Memory is knowledge an agent recalls to inform its decisions, often shared across runs and drawn from past interactions or a knowledge store. State tells the agent where it is; memory tells it what it knows.

In practice

A durable, observable runtime persists each run’s state server-side and updates it as steps complete, which is the foundation that durable execution and crash recovery rest on. Because the state lives outside the process, it can be inspected for observability and reassigned to a healthy worker after a failure, so a crash becomes a pause rather than a restart. For the rationale and how this underpins reliable agents, see why durable agents.

Frequently asked questions

What is the difference between agent state and agent memory?

State is the run's execution position and working data — which step it is on, what tools returned, what it is waiting for. Memory is recalled knowledge an agent uses to inform decisions. State answers where am I in this run; memory answers what do I know.

What does an agent's state contain?

Typically the conversation so far, the current step or branch, the results of completed tool calls, any pending approval, and variables the run has set. Together these describe the exact position of a run so it can continue correctly.

How is agent state preserved across failures?

By persisting it outside the running process. When state lives in durable storage and is updated as each step completes, a crash or restart does not lose it — another worker reads the saved state and resumes from where the run stopped.

See also in the docs

Related terms