AI agents are crossing a line. On one side, they advise. They draft, summarize, recommend, and analyze. If they get it wrong, a human catches it before anything happens. On the other side, they act. They call APIs, execute code, send messages, modify systems, and trigger workflows. If they get it wrong, the consequence lands before anyone reviews it.
Most production AI applications were built for the advisory side. The model does the thinking, a human does the doing. The architecture assumes a person in the middle. That assumption is disappearing.
The industry has made meaningful progress on the build side of this problem. Frameworks for scoped agent identity, documented capability manifests, governed tool connections, instrumented observability, and per-task credential management have established real standards for how agents should be constructed. That work matters. A well-built agent is the precondition for everything that follows.
What follows is the question those frameworks were not designed to answer: what governance needs to exist at runtime, between the agent's reasoning and its ability to affect the real world, after the build is complete and the agent is operating?
Build-time governance establishes how an agent is constructed: what it can access, what tools it connects to, what credentials it holds. Runtime governance addresses what happens while the agent operates: whether its actions stay within scope, whether its reasoning has drifted, and whether there is a second check between a bad decision and an irreversible consequence. These are different disciplines. Most deployments have invested in the first. The second is where the gap is.
The question is not whether models will get more capable and more reliable. They will. The question is what happens when they get it wrong anyway, for whatever reason: a novel attack, a hallucinated action, a misunderstood instruction, a gradual drift in behavior over a long interaction, or simply a bad judgment call on a close decision. The answer, for most production applications today, is: nothing. There is no second check.
Runtime governance for AI agents requires three distinct layers. Each one governs a different dimension of agent behavior. Each one addresses a different category of failure. None is sufficient alone.
Before any interaction begins, the agent needs a defined identity, a clear scope of authority, and a classified action model. This is not a system prompt. It is a set of governance documents the agent reads on startup that establish who it is, who it serves, and where its boundaries are.
Every action the agent might take is pre-classified into tiers. Some actions are autonomous and require no approval. Some are pre-approved within a defined scope. Some require a human to review and approve before they execute. Some are prohibited entirely, with no escalation path. This classification exists before the first interaction and does not change based on what a user asks for.
This layer also establishes the agent's operating context and authority model. The agent understands what organization it operates within, what policies and constraints apply, and who is authorized to direct it at what level of authority. In an enterprise environment, this means the agent knows that a security team lead can approve an elevated action but a general user cannot, or that requests involving regulated data are subject to a different policy than routine operations. Claims of authority from unknown or unauthorized sources conflict with an established trust model. Instructions that fall outside the agent's organizational mandate have something concrete to conflict with, rather than being evaluated on their own merits.
Agent identity, values, and behavioral boundaries. The foundation the agent reasons from.
Tiered action classification. Autonomous, pre-approved, human-in-the-loop, and prohibited. Every action type is sorted before runtime.
Operating context and authority model. The agent understands what organization it serves, what policies apply, and who is authorized to direct it at what level of authority.
Situational context. The agent understands its environment, its mission, and the current state of the systems it operates in. Deep context makes agents harder to mislead or redirect.
The agent has no defined scope. It evaluates every instruction on its own merits with no baseline for what is in bounds and what is not. Scope creep, over-provisioned access, and unclear mandates go unchecked. Any instruction that sounds reasonable gets treated as legitimate, regardless of whether it falls within the agent's actual authority.
A proxy layer between the agent and the AI model that handles routing, token management, and full observability of every reasoning interaction. Every prompt submitted to the model and every response returned is captured, traced, and stored. This is not just logging. It is a continuous record of the agent's decision-making process.
Reasoning traces alone are probabilistic. The model makes judgment calls, and interpreting those judgments from a token stream is itself a judgment call. But the reasoning trace is only one input. The observer has access to the entire observability stack: reasoning traces show what the model decided, action gateway audit logs show what the agent tried to do and what the policy decided, governance event logs show the full lifecycle of enforcement decisions, and system metrics show resource usage, request patterns, and behavioral baselines at the infrastructure level. The governance configuration itself tells the observer what the agent is currently permitted to do.
The detection model is correlation across these independent data sources. The reasoning trace says the agent decided to read a file. The action log shows it attempted to read fifteen files in a directory it has never accessed before. The governance config says file reads are permitted, but the system metrics show a request pattern that deviates from the agent's historical baseline. No single data source is conclusive on its own. The pattern across all of them is where the detection happens. This is the same principle behind security information and event management: correlate firewall logs, endpoint events, authentication records, and application logs to surface patterns that no single source would reveal.
The observer is a separate monitoring system, completely decoupled from the agent itself. The agent does not get to evaluate whether its own reasoning has been compromised. Something else does, using data sources the agent does not control and cannot tamper with. When the observer detects a problem, it can act: trigger an alert, revoke a session, freeze tool access, or force the agent back to a conservative operating posture. This is not forensic analysis after the fact. It is an active correlation engine that monitors multiple independent signals in near real time and can intervene before a bad decision becomes an irreversible action.
Model routing, selection, and fallback logic. The agent talks to the model through this layer, not around it.
Full prompt and completion tracing. Every interaction with the model is captured and searchable.
Action audit correlation. The observer sees not just what the model reasoned, but what actions were attempted and what the policy layer decided. Reasoning traces and enforcement logs are correlated to detect mismatches between stated intent and actual behavior.
Log aggregation pipeline. Governance events from across the stack are collected, pushed to a centralized aggregator, and stored outside the agent's own reach. The agent cannot tamper with its own audit trail.
Independent observer with maintained reference baselines. A workflow automation layer correlates reasoning traces, action logs, governance events, and system metrics to detect anomalies. The observer maintains its own reference library: a tools inventory, expected behavior profiles, and a learning window for legitimate governance changes. When the agent's scope evolves through proper channels, the observer's baselines evolve with it.
Metrics, dashboards, and alerting. Operational visibility into agent behavior over time. Pattern detection across interactions, not just individual events.
The model is a black box. You have no visibility into whether the agent's reasoning has drifted, been manipulated, or simply produced a bad judgment call. There is no independent check on the agent's decision-making. You find out after the consequence lands.
A hard policy boundary between the agent's reasoning and your systems. Every tool call, every API interaction, every external action passes through a policy decision point before it executes. This layer is deterministic. An allowlist either permits the action or it doesn't. A policy engine either approves or denies. The model's internal state is irrelevant.
This is the layer that separates thinking wrong from doing wrong. An agent can reason itself into any conclusion for any number of reasons: manipulation, hallucination, misinterpretation, drift, or honest error. None of that matters here. The action gateway does not ask the agent whether it believes it should be allowed to call the billing API. It checks the policy. The default posture is ask. The fallback is deny.
For elevated actions, the gateway implements a human approval workflow. The agent stops, sends a request for approval through a verified channel, and waits. The action does not proceed until a human with the appropriate authority explicitly approves it. Every attempt is logged, whether approved or denied.
Action allowlists scoped to specific tools, paths, and operations. The agent can only reach what it has been explicitly provisioned to reach.
A default posture of "ask" with a fallback of "deny." If the action is not on the approved list, it does not happen. Ambiguity resolves to denial, not to trust.
Human approval workflows for elevated actions. The agent stops and waits. The approval channel is separate from the agent's own communication, preventing the agent from approving its own requests.
Audit logging of every action attempt. Approved, denied, timed out. The complete record of what the agent tried to do and what the policy decided.
Every tool and API the agent was provisioned with at build time is available at runtime with no further check. Nobody re-evaluates intent. Nobody asks whether the reasoning that produced the action was sound. Even a well-built agent with scoped identity and documented capabilities can act outside its intended scope if there is no runtime enforcement between its reasoning and your systems.
Model hardening reduces the likelihood that the agent reasons incorrectly. Better training, better alignment, better instruction following. This is important and improving steadily.
Runtime governance controls what happens when the agent reasons incorrectly anyway. The action still has to pass through a policy gate. The reasoning is still observed by an independent system. The identity still defines what is in scope.
The causes of bad reasoning are infinite: manipulation, hallucination, misinterpretation, drift, honest error. The consequence is the same: the agent attempts an action it should not take. Model hardening addresses the causes. Runtime governance addresses the result. If you can only invest in one, invest in the layer that catches the action before it executes.
The agent is correctly constructed: scoped identity, documented capabilities, instrumented connections.
Those controls were evaluated at deployment. What happens at runtime is governed by the model's own judgment.
Every tool provisioned at build time is available at runtime with no further check.
There is no independent observer watching the agent's reasoning for drift or anomaly.
Intent is established at build time and assumed from there forward.
You know the agent was built correctly. You don't know whether it's operating correctly right now.
The agent is correctly constructed, and its runtime behavior is independently governed.
Every action is classified into tiers: autonomous, pre-approved, requires approval, or prohibited.
A policy decision sits between the agent and every tool call. Default is ask. Fallback is deny.
An independent observer correlates reasoning traces, action logs, and system metrics in near real time.
Intent is re-evaluated at every action boundary, not assumed from build time.
You know the agent was built correctly, and you can verify it's operating correctly right now.
Security practitioners will recognize this pattern. It is defense in depth applied to AI agent governance. The same principle that puts a firewall, an intrusion detection system, network segmentation, and endpoint protection between an attacker and your data. No single control is expected to hold on its own. The system is designed so that the failure of one layer is caught by the next.
The AI agent governance conversation has made real progress on the build side. What comes next is the runtime side: what to put between an agent's reasoning and your production systems while the agent is operating. Agent Runtime is one approach to filling that gap. This piece describes its starting point.
If your AI agent made a bad decision right now, what would stop it from acting on that decision with the tools it already has access to? If the answer is "the model would catch its own mistake," you have one layer. These are the other two.
This piece covers one dimension of the Agent Runtime framework: what needs to exist between an agent's reasoning and your production systems at runtime. The broader framework addresses organizational readiness, maturity assessment, action classification, governance principles, and the operational transition from human-operated to agent-operated environments. These three layers are a starting point, not a complete answer. Open design challenges remain, including baseline maintenance for the observer, scope boundaries for vendor-hosted agent environments, and the operational overhead of fine-grained runtime scoping. This is a direction we are actively building toward, not a destination we have reached.