Memory is how an agent retains and recalls information beyond a single model call. Because a language model only sees what fits in its context window for one request, memory supplies the mechanism for carrying conversation history, learned facts, and past interactions across the steps of a run and across separate sessions.
Why memory matters
A language model is stateless: each call sees only the text it is given and remembers nothing afterward. For a single question that is fine, but an agent that holds a conversation, works on a task over many steps, or returns to help the same user tomorrow needs continuity. Without memory, every interaction starts from zero, and the agent cannot build on what it already knows or did.
The naive fix — stuffing the entire history into every prompt — fails quickly. Context windows are finite, larger prompts cost more and run slower, and irrelevant detail can degrade the quality of the model’s responses. Memory is the discipline of deciding what to keep, where to keep it, and what to surface for any given step, which is closely tied to context engineering.
How it works
Agent memory is usually organized into a few cooperating layers:
- Short-term memory holds the active interaction — the running conversation and the results of recent steps — and is placed directly in the context window for the current call.
- Long-term memory persists information in external storage between sessions, so it outlives any single run or process.
- Retrieval selects which long-term items are relevant to the current step and injects only those into the prompt, rather than loading everything.
- Writing and consolidation decide what from the current interaction is worth saving, and may summarize or distill it before storing.
The distinction between conversation history and durable knowledge is common enough that runtimes often expose them as separate facilities — one for chat history within a session, another for long-term semantic recall across sessions.
Memory vs. state
Memory and state overlap but are not the same. State is the precise, structured record of where a run is — which step it is on, what each tool returned, what it is waiting for — and it must be exact for the run to resume correctly. Memory is about meaning the agent should recall: facts, context, and past interactions that inform its reasoning. State keeps a run mechanically correct; memory keeps it contextually informed.
In practice
A durable, observable runtime typically separates conversation memory, which carries chat history within a session, from semantic memory, which stores knowledge for long-term retrieval across sessions. Both feed the model at compile time, and what gets recalled is a context engineering decision, often implemented with retrieval-augmented generation. Distinct from this recall is durable state management, which records exactly where a run is so it can resume. For the available memory types, see conversation and semantic memory.