The security assumption agentic AI just broke

I ran a red-team exercise against an internal IT-support agent wired across a stack any large enterprise would recognize: ServiceNow for tickets, SharePoint for policy and procedure docs, an internal directory for routing. The agent had legitimate read access to all three and could draft replies but not send them. Inside two hours, it had triaged a routine access-request ticket into a chain that reconstructed an in-progress reorganization no individual in the loop was cleared to discuss. No tool call was outside policy. No permission was misconfigured. Every step had a paper trail. That’s the pattern I keep coming back to. The risk conversation has centered on model behavior — hallucinations, jailbreaks, unsafe outputs — but once AI systems are connected to tools, memory and internal workflows, the harder question is execution governance: What the system is permitted to do, how far its access extends and whether anyone can reconstruct the action chain afterward. That’s where most organizations are exposed. What rarely gets acknowledged is that the enterprise controls we rely on were never designed to account for human friction, even though they depended on it. An analyst who hesitates before chaining together a dozen sensitive queries. Someone who, seven steps into a workflow, decides something doesn’t feel right. That latency was a byproduct of humans being the actors, not a design choice anyone made deliberately. It functioned as an accidental safety property embedded in every process we built. Agentic AI removes all of that. Agents move through workflows without the friction, fatigue or unease that causes humans to slow down at the moments that matter. The controls we built weren’t designed to compensate for their absence. The deployment data confirms this isn’t a future problem. According to ETR data presented at RSAC 2026 , 37% of organizations already have AI agents deployed or in active testing, while only 3% report having broad agent-specific security controls in place. Most organizations are running agents in environments that weren’t instrumented to govern them. Why the controls you’re relying on weren’t built for this When I see organizations respond to prompt injection risk, the instinct is almost always the same: Input filtering — classify the bad instruction before it reaches the model. When the risk is agent access, the response has been tightening identity controls to reduce blast radius. Both are the right instincts applied at the wrong layer. In March 2026, OpenAI published an assessment of real-world prompt injection attacks that made the filtering problem concrete. The most effective attacks, they found, increasingly resemble social engineering rather than simple prompt overrides, and identifying a sophisticated adversarial prompt is effectively the same problem as detecting a lie without access to the full context. An attack disguised as a routine HR email succeeded 50% of the time with all of OpenAI’s defenses active. Their conclusion was that defense cannot rely primarily on input filtering; the system has to be designed so the impact of manipulation stays constrained even when attacks get through. The reason this matters goes back to the original assumption. Prompt-layer defenses were built expecting a human somewhere downstream who might review an output, notice something odd or decline to take the final step. When an agent takes that step autonomously, the filter carrying all of that weight has to catch everything, and there is no evidence that it can. Identity-layer controls carry a parallel assumption. They evaluate who is accessing what, assessed per resource and per system, but weren’t built to evaluate what a system is doing across a chain of actions taken on behalf of an identity. An agent with legitimate access to an employee directory, a project management system and a calendar can correlate all three to surface conclusions that no individual permission was meant to cover, and every access along the way was authorized. This is the mosaic effect: A concept from intelligence and privacy disciplines describing how aggregating individually permissible information can produce an outcome more sensitive than any single piece would suggest. In February 2026, NIST published a concept paper proposing to adapt existing identity and authorization frameworks specifically for AI agents, explicitly because the existing frameworks weren’t designed for non-human principals that act autonomously, chain actions and require continuous rather than session-based authorization. What the actual attack surface looks like The vulnerabilities agents expose aren’t new. Overbroad permissions, overly generous retrieval, loosely scoped connectors, workflows designed with an implicit assumption that a human would pause before a consequential step: These have always been enterprise weaknesses. What’s changed is that agents exercise them continuously and at machine speed. Research presented at Black Hat USA illustrates how quickly these conditions combine. An attacker sends an email to a support address connected to Zendesk, which automatically syncs into Jira. A developer’s AI coding agent reads the ticket as part of normal workflow, and the injected prompt coerces it into extracting repository secrets, including API keys and access tokens, with no action required from the victim beyond their ordinary use of the tool. The agent never exceeded its assigned permissions. The blast radius came entirely from the scope of what it was legitimately authorized to do. The authorization problem runs deeper than any single access, though. A December 2025 paper found that more than 90% of the privacy research literature addresses only single-step leakage, and none of the agent-level evaluation frameworks currently in use model the multi-tool inference chains, where the agent assembles a picture from pieces each of which it had every right to see, faster than any review process can intervene. Object-level permission audits don’t catch this class of risk. This is the dynamic I’ve been most focused on in my own research . In a pre-registered pilot on identity drift in self-modifying agents, the cleanest finding wasn’t dramatic. After a shallow revert of an agent’s self-description, the per-action audit was clean: Every step within policy, every change logged. But the behavioral trajectory, measured at the embedding level, hadn’t reverted with it. The pattern generalizes uncomfortably well to enterprise deployments: An agent that’s been rolled back after an incident — system prompt reset, instructions retightened — can carry residue of the prior state in its memory and continue acting on it. When the unit of governance is the action, the thing you actually wanted to govern can drift past you in plain sight. What execution-layer governance actually requires The through-line is a shift in where controls have to live. Prompt-layer and identity-layer controls carry implicit dependencies on human behavior that agents don’t satisfy. The missing layer is execution governance: Controlling what the system can actually do when it acts, which is a different problem from controlling what it can see or what instructions it receives. OpenAI’s March 2026 framework offers a useful organizing principle: Design the system so that the consequences of a successful attack remain constrained even when manipulation gets through. An agent limited to reversible actions, required to pause for confirmation before consequential steps, keeps the blast radius manageable regardless of what it’s told. The relevant design question is outcome containment alongside attack prevention. In practice, most deployments haven’t built what this requires. Separating read from act needs to be a hard architectural distinction; summarizing a document and transmitting data from it are different actions, and the system should enforce that difference rather than assume the agent will respect it. Memory and context need explicit bounds, because persistence is a security primitive with real blast-radius implications. A complete trace of request, context, tool calls and outputs needs to be designed in from the start rather than assembled after the fact when something goes wrong. And the red-teaming program needs to target the full workflow rather than the model in isolation. Of those four, the read/act split is the one I see teams consistently underestimate. A sales-ops agent with read access to Salesforce and the ability to draft customer emails is one tool-call away from transmitting data to a third party, and most enforcement was never built to detect the difference between summarizing an account and sending a summary of it. The failure that won’t look like a failure A 2026 survey of 1,253 cybersecurity professionals found that 32% of organizations currently lack AI agent visibility. The report describes a scenario worth sitting with: A SOC analyst arrives Monday morning, traces an anomalous privilege change to a service account created by an agent 72 hours earlier and finds that the agent has been writing to production systems all weekend. Every action is logged. No alert fired because no detection rule existed for agent-initiated behavior. What concerns me is that without agent-aware detection, the incident gets categorized as a service account control failure, remediated and closed as a known issue type, with the underlying AI governance problem unrecognized and the conditions that produced it unchanged. The question worth asking before that Monday morning arrives is whether your detection and response workflows would recognize an AI governance failure if they encountered one, or whether the logs would just show a busy service account. This article is published as part of the Foundry Expert Contributor Network. Want to join?

Read Original Article →

Source

https://www.cio.com/article/4176552/the-security-assumption-agentic-ai-just-broke.html