State management is how an agent’s execution position and working data are stored and kept current as a run proceeds. The state is everything needed to know where a run is and what it has done — the steps taken, the tool results gathered, the current branch, and anything it is waiting on — so the agent can always continue from a correct, well-defined point.
Why state management matters
Every step an agent takes depends on what came before: the next model call needs the conversation so far, a branch decision needs earlier tool results, and resuming after a pause needs to know what the run was waiting for. That accumulated context is the run’s state, and where it is kept determines how robust the agent is.
The common shortcut is to hold state only in process memory. That works until the process stops — a crash, a deploy, a reclaimed instance — at which point the state, and therefore the run, disappears. There is no correct position to return to, so the only option is to start over, repeating model calls and tool side effects that already happened. Sound state management is what lets a run be paused, observed, moved to another machine, and resumed without losing its place.
How it works
State management gives a run a single authoritative record of its position, updated as it advances:
- The run begins with an initial state — its inputs and an empty history.
- As each step completes, its outcome is written into the state: a tool result is stored, the current step advances, a variable is set, or a pending approval is recorded.
- The next step reads the current state to decide what to do, so decisions always reflect everything that has happened so far.
- When the run pauses or fails, the last saved state is the exact point from which it can resume.
Keeping this record outside the process is what makes it survivable: any healthy worker can read the saved state and continue, and any observer can read it to see where the run stands.
State vs. memory
State and memory are easy to conflate but answer different questions. State is the execution position of a particular run — which step, which results, what it is waiting for — and it is specific to that run. Memory is knowledge an agent recalls to inform its decisions, often shared across runs and drawn from past interactions or a knowledge store. State tells the agent where it is; memory tells it what it knows.
In practice
A durable, observable runtime persists each run’s state server-side and updates it as steps complete, which is the foundation that durable execution and crash recovery rest on. Because the state lives outside the process, it can be inspected for observability and reassigned to a healthy worker after a failure, so a crash becomes a pause rather than a restart. For the rationale and how this underpins reliable agents, see why durable agents.