Context engineering is the discipline of curating what goes into a model’s context window at each step of an agent’s run. Because a model can only act on what is in its context, this work decides which instructions, prior turns, retrieved documents, and tool results are present at any moment — and, just as importantly, what is left out — so the model has the information it needs without being overwhelmed by what it does not.
Why context engineering matters
A model has no memory of its own between calls and a finite context window for each one. Everything it can use to make a decision has to be placed in that window, and the window has a hard limit. In a multi-step agent the demand on this space grows quickly: the conversation lengthens, tools return large outputs, and retrieved documents pile up, all competing for room the model has to read every time.
Two failure modes follow. The window can simply run out, forcing information to be dropped. More subtly, a window that is full but unfocused degrades quality even when it fits — a phenomenon often called context rot, where relevant detail is buried under stale or irrelevant content and the model attends to the wrong things. Context engineering exists to manage this scarce resource deliberately, treating the contents of the window as something to be assembled and pruned rather than allowed to accumulate.
How it works
Context engineering combines several techniques to keep the window focused:
- Selection — choose what to include for the current step rather than passing everything, for example retrieving only the documents relevant to the immediate question.
- Compression — summarize or truncate long histories and large tool outputs so their substance survives without their bulk.
- Externalization — keep durable facts and progress in memory or state outside the window, and bring back only what a given step needs.
- Isolation — give each agent or sub-task a narrow scope so its context holds only what that work requires, instead of one growing context for everything.
- Ordering — place the most important material where the model attends to it most reliably.
These techniques are applied continuously across the run, not once at the start, because what belongs in the window changes from step to step.
Context engineering vs. prompt engineering
Prompt engineering is about crafting the wording of an individual instruction — phrasing, format, and examples for a single call. Context engineering is the larger concern of managing the whole context window across an entire run: which pieces of history, retrieval, memory, and tool output are present at each step, and how that set is built and trimmed over time. A well-worded prompt inside a poorly managed context still degrades, which is why the two are complementary rather than interchangeable.
In practice
Managing context well depends on durable state to draw from and a record of what each step saw. A durable, observable runtime persists an agent’s memory and progress server-side, so context can be reassembled from a reliable source and the inputs to each step are inspectable in the trace. Context engineering draws on memory for durable facts, on retrieval-augmented generation to bring in relevant material on demand, and on state management to track what persists across the agent loop. For how an agent retains information, see the memory concepts.