Guardrails

What Guardrails Actually Stop — and What Only Insurance Can

Auly Editorial · Jun 11, 2026 · 4 min read

Teams deploying AI agents typically reach for the same short list of defenses: human-in-the-loop approval, least-privilege credentials, audit logs. These are all correct choices. They are also incomplete, and the gap between "we have guardrails" and "we have managed our risk" is where losses occur.

This post frames three distinct layers of control, explains what each one does and does not cover, and explains why the residual that survives all three layers is what insurance exists to address.

Layer 1 — Frequency controls (human-in-the-loop, least privilege)

The goal of frequency controls is to reduce how often an agent takes an action that causes harm. The two most effective mechanisms are:

Human-in-the-loop approval requires a human to explicitly authorize high-impact operations before they execute. NIST's AI Risk Management Framework (AI RMF 1.0) identifies human oversight as a core governance practice, particularly for decisions that are consequential or difficult to reverse. The practical effect: an agent that must ask before acting cannot act autonomously on a bad inference.

Least-privilege credentials constrain what an agent can reach even if it reasons incorrectly. OWASP's Top 10 for LLM Applications identifies "Excessive Permissions" (LLM06:2025) as one of the leading sources of agent-driven harm: when a credential authorizes more operations than the task requires, a reasoning error in the agent can become a destructive action at the infrastructure layer. Narrowing scope at the credential level means a misfire that would have been catastrophic is instead a blocked API call.

Both controls reduce frequency — how often a harmful outcome can occur. Neither eliminates it. A human can approve a command without reading it. A narrow credential can still authorize a delete if the scope was drawn too broadly. Frequency controls are necessary and imperfect.

Layer 2 — Severity controls (reversibility)

When a harmful action does occur, reversibility determines how bad the outcome is. A team that can roll back a database to a state from five minutes ago has a recoverable incident. A team that loses both production data and its only backup copy has a catastrophe.

Reversibility is a design property of the system the agent operates in, not a property of the agent itself. The questions are:

Are backups stored independently, in a location the agent's credentials cannot reach?
Does the platform support point-in-time recovery, or only periodic snapshots?
Is there a delay between a deletion command and permanent data removal?

When reversibility is high, a frequency-control failure is a near-miss. When reversibility is low or absent, a frequency-control failure is a loss event. Designing for reversibility is therefore the correct complement to frequency controls — it caps the severity of the failures that get through.

The residual: what survives both layers

Good frequency controls reduce incidents. Good reversibility controls reduce the cost of the incidents that occur. Together they make the risk smaller. They do not make it zero.

The residual risk is what remains after a team has implemented reasonable controls: the incident that still happens because a configuration was wrong, a credential was miscategorized, a backup rotation had an undetected gap, or a platform behaved in an undocumented way. This residual is stochastic — it follows from the statistical reality that no control set has perfect coverage across all failure modes.

This is the boundary between risk reduction and risk transfer. Controls are tools for reduction. Insurance is the mechanism for transfer. Confusing the two — treating a comprehensive guardrail set as a substitute for coverage — is the same error as assuming a well-maintained car cannot be totaled.

What this means in practice

Teams evaluating their AI-agent deployment should ask three separate questions, not one:

Frequency: For each high-impact operation the agent can perform, is there a human approval gate that cannot be bypassed through prompt reasoning alone?
Severity: For each data store or system the agent can modify, is there an independent, out-of-band recovery copy the agent's credentials cannot reach?
Residual: Given an honest assessment of the controls in place, what is the financial exposure if a loss occurs anyway?

The third question is the one most teams skip. It is also the most important, because the residual is the number that determines whether a loss event is a recoverable operational incident or an existential one.

Guardrails answer questions one and two. Insurance answers question three. Both are required.

Sources

See the risk in what your agents do.

Auly scores what your agents can do, helps you reduce what's at stake, and insures what's left.

Get early access →