Guardrails are validation checks placed around an agent that enforce rules on what it receives and what it produces. They inspect inputs before the agent acts and outputs before they are returned, and when a check fails they intervene — blocking, correcting, or escalating — so the agent’s non-deterministic behavior stays within defined bounds.
Why guardrails matter
An agent chooses its own actions at runtime, which means it can do things its author did not intend: act on a malicious instruction hidden in an input, call a tool with unsafe arguments, return content that violates policy, or produce output in a shape the next system cannot consume. Because the model’s behavior is not fully predictable, you cannot rely on the prompt alone to keep every run in line.
Guardrails address this by making safety a property of the system rather than a hope about the model. They turn implicit expectations — this field must be valid, this category of request must be refused, this output must match a schema — into explicit checks that run every time. That matters most in production, where an agent acts on real systems and a single bad action can have real consequences. Guardrails bound the blast radius without requiring the model itself to be perfect.
How it works
Guardrails wrap the agent at its edges and run as part of each step:
- Input validation runs before the agent acts, inspecting the prompt or a tool argument and rejecting or sanitising anything that breaks a rule.
- Output validation runs after the agent produces a result, checking it against constraints such as format, content policy, or required fields.
- When a check passes, the run proceeds unchanged.
- When a check fails, a configured failure mode decides what happens next — commonly retry the step, raise an error, attempt to fix the result, or escalate to a human.
Because the checks are explicit and run on every invocation, they apply uniformly regardless of which path the model chose, and each pass or failure is recorded as part of the run.
Guardrails vs. evaluation
Guardrails and evaluation both concern quality but operate at different times. A guardrail runs inline during a live run and can change its course in the moment, so it is an enforcement mechanism. Evaluation runs after the fact, usually across many runs, to measure how well the agent performs, so it is a measurement mechanism. The two are complementary: evaluation reveals where an agent tends to go wrong, and guardrails enforce limits on those behaviors in production.
In practice
A durable, observable runtime runs guardrails as part of each step and records every pass and failure, so violations are visible rather than silent. When a check fails, the configured mode can retry, raise, fix, or hand off to a person, which connects guardrails to human-in-the-loop review and approval gates. They are especially important around tool use, where an unchecked argument becomes a real action. For input and output validation and the available failure modes, see guardrails.