Patterns & reasoning

What is Context Engineering?

Also called: context engineering

Updated June 24, 2026
Quick Definition

Context engineering is the discipline of curating what goes into a model’s context window at each step of an agent’s run. Because a model can only act on what is in its context, this work decides which instructions, prior turns, retrieved documents, and tool results are present at any moment — and, just as importantly, what is left out — so the model has the information it needs without being overwhelmed by what it does not.

Why context engineering matters

A model has no memory of its own between calls and a finite context window for each one. Everything it can use to make a decision has to be placed in that window, and the window has a hard limit. In a multi-step agent the demand on this space grows quickly: the conversation lengthens, tools return large outputs, and retrieved documents pile up, all competing for room the model has to read every time.

Two failure modes follow. The window can simply run out, forcing information to be dropped. More subtly, a window that is full but unfocused degrades quality even when it fits — a phenomenon often called context rot, where relevant detail is buried under stale or irrelevant content and the model attends to the wrong things. Context engineering exists to manage this scarce resource deliberately, treating the contents of the window as something to be assembled and pruned rather than allowed to accumulate.

How it works

Context engineering combines several techniques to keep the window focused:

  1. Selection — choose what to include for the current step rather than passing everything, for example retrieving only the documents relevant to the immediate question.
  2. Compression — summarize or truncate long histories and large tool outputs so their substance survives without their bulk.
  3. Externalization — keep durable facts and progress in memory or state outside the window, and bring back only what a given step needs.
  4. Isolation — give each agent or sub-task a narrow scope so its context holds only what that work requires, instead of one growing context for everything.
  5. Ordering — place the most important material where the model attends to it most reliably.

These techniques are applied continuously across the run, not once at the start, because what belongs in the window changes from step to step.

Context engineering vs. prompt engineering

Prompt engineering is about crafting the wording of an individual instruction — phrasing, format, and examples for a single call. Context engineering is the larger concern of managing the whole context window across an entire run: which pieces of history, retrieval, memory, and tool output are present at each step, and how that set is built and trimmed over time. A well-worded prompt inside a poorly managed context still degrades, which is why the two are complementary rather than interchangeable.

In practice

Managing context well depends on durable state to draw from and a record of what each step saw. A durable, observable runtime persists an agent’s memory and progress server-side, so context can be reassembled from a reliable source and the inputs to each step are inspectable in the trace. Context engineering draws on memory for durable facts, on retrieval-augmented generation to bring in relevant material on demand, and on state management to track what persists across the agent loop. For how an agent retains information, see the memory concepts.

Frequently asked questions

What is the difference between context engineering and prompt engineering?

Prompt engineering focuses on wording a single instruction to a model. Context engineering is broader: it governs everything in the context window across a run — instructions, history, retrieved documents, and tool results — and how that set is assembled and pruned at each step.

What is context rot?

Context rot is the decline in a model's reliability as its context grows long or cluttered. Relevant detail gets buried among stale or irrelevant content, so the model attends to the wrong things even though the right information is technically present.

How do you manage a limited context window?

Common techniques include retrieving only the most relevant material on demand, summarizing or dropping old turns, storing durable facts in external memory, and giving each agent a narrow scope so its context stays focused rather than accumulating everything.

See also in the docs

Related terms