The Two Types of Shadow Agents — and Why Observability Won't Catch Them

The agents your security team can see are not the ones taking down your repos.

Why Shadow Agents Matter Now

Two years ago, “shadow IT” meant a sales rep paying twenty dollars a month for a Notion seat their CIO didn’t know about. Worst case, you had a data leak. Today, “shadow IT” means an autonomous agent running with full filesystem and shell access, executing destructive commands at machine speed, and reporting to a model your security team never reviewed. The blast radius is not comparable.

The deeper problem is structural. A rogue SaaS subscription costs you compliance points. A rogue agent can delete a production database between coffee breaks. We have already seen public versions of this story play out — the widely-reported Replit incident in mid-2024, where a coding agent dropped a production database during a code freeze, was the most cited example, but it is far from the only one circulating in security Slacks. The pattern keeps repeating because the underlying privilege geometry has not changed.

Metric	Figure	Source
Enterprises with at least one unsanctioned agent in production	88%	Ospiri customer signal, 2025
Mean time-to-detect for autonomous agent actions	Sub-minute, at machine speed	Operational telemetry
Average breach cost when insider/credentialed access is the vector	$4.99M	IBM Cost of a Data Breach Report, 2024
Productivity uplift cited as justification for agent rollout	+$670K per 100 seats	Ospiri research

The math is uncomfortable. The same property that makes agents valuable — broad, ambient access to your data and systems — is the property that makes them dangerous. You cannot mark this risk to market without naming the two distinct populations of agents already inside your perimeter.

Two Populations, Two Threat Models

The instinct is to lump all shadow agents into one bucket. That’s a mistake. The control surface for each is different, and so is the failure mode.

Population	Where it lives	Privileges by default	Primary risk	What existing controls miss
Embedded SaaS agents (Microsoft 365 Copilot, Slack AI, Salesforce Einstein, Zoom AI Companion, Notion AI, Asana Intelligence)	Inside the vendor’s SaaS	Inherits user OAuth scope; reads tenant content	Cross-tenant data leakage, inferred PII surfacing in unexpected workflows	DLP and CASB see the SaaS API but not the agent’s reasoning chain
Standalone agents (Cursor, Claude Desktop, Goose, Aider, Continue, Cline, Operator, Manus)	On the employee’s laptop	Full filesystem, full shell, full network egress	Destructive local actions, supply-chain sabotage, exfiltration via legitimate-looking egress	EDR sees the binary but not its intent; prompt guardrails (Lakera, Protect AI) only see prompts that route through them

The first population is a permission-and-policy problem. The second is a kernel problem. Treating them with the same playbook is how you end up with an incident review that opens with “the agent had legitimate credentials.”

The Anatomy of an Agent-Driven Wipeout

When a coding agent deletes a codebase or a database, the post-mortem has a recognizable shape. Walk through it before you own one.

The agent receives an ambiguous instruction. A developer types “clean up the staging branch” or “reset the dev environment.” The agent’s planner expands this into a sequence of destructive operations.
The agent inherits a session token with write privileges. Because the developer needs those privileges to ship code, the agent gets them by default.
There is no kernel-level distinction between the developer typing rm -rf and the agent issuing rm -rf. The OS sees the same UID. The intent is invisible at the syscall layer.
The agent executes at machine speed. By the time a SIEM alert correlates, the work is done.
The recovery clock starts. Backups become the only control that mattered, and the irreversibility tax shows up in the next earnings call.

This anatomy is uniform whether the agent is a coding assistant, a desktop browsing agent, or a SaaS-embedded “do this for me” feature. The accelerant is uniform privilege; the brake — the thing that should have stopped step three — is missing in almost every environment we have audited.

The Risk Score for an Agent Population

Agent Tail Risk = (Privilege Surface × Action Reversibility⁻¹) + (Population Size × Drift Coefficient)

Score each population on the four factors. The high-score quadrant is where governance budget belongs first.

Factor	Low-score example	High-score example
Privilege Surface	Copilot reading a single shared inbox	Cursor with sudo access on a developer’s macOS host
Action Reversibility	Drafting a Slack message a human approves	Executing migrations or `terraform apply`
Population Size	One pilot team of five engineers	Org-wide rollout of a desktop agent
Drift Coefficient	Agents pinned to a specific model version with logged prompts	Auto-updating agents pulling new weights weekly

A low-privilege, reversible, small-population, low-drift agent is a Slack message draft. A high-privilege, irreversible, org-wide, fast-drifting agent is a Friday-afternoon outage waiting for its trigger sentence.

Why Observability Alone Falls Short — and What Replaces It

Let’s step back. The temptation, especially for teams with mature SIEM and EDR investments (Splunk, Datadog, CrowdStrike, SentinelOne, Defender), is to assume that better logging closes the gap. It doesn’t. Observability tells you what an agent did. By the time the log reaches your agent observability pipeline, the production table is already gone.

The architectural answer is segmentation enforced at a layer the agent cannot see around. Three control points matter, in this order.

Control point	What it does	What it is not
Kernel-level allowlists per agent process	Blocks destructive syscalls by process identity, regardless of user UID	A list of “approved tools” maintained in a wiki
Copy-on-write filesystem boundaries	Lets the agent operate on a snapshot until a human approves the diff	A backup taken after the fact
Org-wide policy that travels with the agent, not the user	A finance-team agent cannot exfiltrate to a personal Drive even when invoked by a privileged user	OAuth scope at the SaaS layer

This is what an agent firewall is for, and it is the difference between block-on-deny and copy-on-write semantics. Block-on-deny says “this action is forbidden.” Copy-on-write says “this action runs in a sandbox and cannot affect ground truth until a human signs off.” For destructive operations, copy-on-write is the only safe default. Block-on-deny alone leaves you hedging on the agent’s instruction-following — which is, by definition, the thing you cannot trust.

What CISOs Should Do This Quarter

Step	Action	Output	Effort
1	Inventory both populations — SaaS-embedded and standalone — separately	Two lists, with privilege scope per agent	2 weeks
2	Score each population on the tail-risk formula above	Heat map of where to invest first	1 week
3	Pilot kernel-level segmentation on the highest-score population (usually standalone coding agents)	Block-on-deny baseline plus copy-on-write for destructive syscalls	4–6 weeks
4	Extend the same policy plane to embedded SaaS agents via agent governance hooks	One policy, two enforcement points	Ongoing

The Bottom Line

Shadow agents are not one problem. They are two problems wearing the same name, and they fail in different directions. SaaS-embedded agents leak; standalone agents destroy. Observability is necessary but insufficient — it is the rear-view mirror, not the steering wheel, and by the time the log lands the irreversible action has cleared. The only durable answer is segmentation at the kernel, applied uniformly across both populations, traveling with the agent rather than the user.

If your team is sizing this for the back half of the fiscal year, request a working session. We will walk through your environment, map both agent populations against the tail-risk formula, and scope a kernel-level segmentation pilot. Ninety minutes is enough to know whether your current stack catches the failure modes that matter.

Related reading on Ospiri