What is the difference between short-term and long-term agent memory?

Short-term memory holds the current interaction — the conversation and recent steps — and fits in the model's context window. Long-term memory persists information across sessions in external storage and is retrieved on demand when it becomes relevant.

How is agent memory different from RAG?

RAG is a retrieval technique for pulling relevant text into the prompt. Memory is the broader concern of what an agent retains and recalls over time; long-term memory is often implemented using retrieval, so RAG is one mechanism that serves memory rather than a synonym for it.

What is the difference between semantic and episodic memory?

Semantic memory stores general facts and knowledge independent of when they were learned. Episodic memory stores specific past events or interactions, such as what a user asked in an earlier session, so the agent can recall particular experiences.

What is Agent Memory? — Agentspan glossary

Quick Definition

Memory is how an agent retains and recalls information beyond a single model call. Because a language model only sees what fits in its context window for one request, memory supplies the mechanism for carrying conversation history, learned facts, and past interactions across the steps of a run and across separate sessions.

Why memory matters

A language model is stateless: each call sees only the text it is given and remembers nothing afterward. For a single question that is fine, but an agent that holds a conversation, works on a task over many steps, or returns to help the same user tomorrow needs continuity. Without memory, every interaction starts from zero, and the agent cannot build on what it already knows or did.

The naive fix — stuffing the entire history into every prompt — fails quickly. Context windows are finite, larger prompts cost more and run slower, and irrelevant detail can degrade the quality of the model’s responses. Memory is the discipline of deciding what to keep, where to keep it, and what to surface for any given step, which is closely tied to context engineering.

How it works

Agent memory is usually organized into a few cooperating layers:

Short-term memory holds the active interaction — the running conversation and the results of recent steps — and is placed directly in the context window for the current call.
Long-term memory persists information in external storage between sessions, so it outlives any single run or process.
Retrieval selects which long-term items are relevant to the current step and injects only those into the prompt, rather than loading everything.
Writing and consolidation decide what from the current interaction is worth saving, and may summarize or distill it before storing.

The distinction between conversation history and durable knowledge is common enough that runtimes often expose them as separate facilities — one for chat history within a session, another for long-term semantic recall across sessions.

Memory vs. state

Memory and state overlap but are not the same. State is the precise, structured record of where a run is — which step it is on, what each tool returned, what it is waiting for — and it must be exact for the run to resume correctly. Memory is about meaning the agent should recall: facts, context, and past interactions that inform its reasoning. State keeps a run mechanically correct; memory keeps it contextually informed.

In practice

A durable, observable runtime typically separates conversation memory, which carries chat history within a session, from semantic memory, which stores knowledge for long-term retrieval across sessions. Both feed the model at compile time, and what gets recalled is a context engineering decision, often implemented with retrieval-augmented generation. Distinct from this recall is durable state management, which records exactly where a run is so it can resume. For the available memory types, see conversation and semantic memory.

What is Agent Memory?

Why memory matters

How it works

Memory vs. state

In practice

Frequently asked questions

What is the difference between short-term and long-term agent memory?

How is agent memory different from RAG?

What is the difference between semantic and episodic memory?

See also in the docs

Related terms