Every trading desk in Manhattan knows its Value-at-Risk before lunch. Most CISOs cannot tell you the agent-equivalent at four o’clock on a Friday.

Why the Agent Risk Score Matters Now

Security teams keep getting asked the same question by their audit committees: “How exposed are we to AI agents, in dollars?” The honest answer today is usually a shrug followed by some narrative bullets. That is not a posture. That is a vibe.

The capital-markets industry solved this problem three decades ago for portfolios. You decompose exposure into a small set of measurable factors, weight them, sum them, and produce a single rollup number you can defend in front of a regulator. The same discipline is overdue for agents.

Stat Value Source
Enterprises with at least one unsanctioned agent on a managed endpoint 88% Ospiri signature pipeline, 2026
Average annualized cost of a single uncontrolled agent incident +$670K Ospiri, against IBM Cost of a Data Breach baselines
Median window from first agent deployment to first material incident 12-18 months Ospiri field data

The dashboard needs to exist before the incident, not after. This piece walks through the framework Ospiri uses with design partners.

What an Agent Risk Score Is — and Is Not

Most “AI risk” frameworks circulating today are qualitative posters: a 3×3 matrix with high/medium/low boxes and no formula underneath. A real risk score has to be quantifiable, comparable across endpoints, and decomposable into the factors a CISO can actually move.

Framework What it measures Decomposable Comparable across endpoints
NIST AI RMF Process maturity Partial No
ISO 42001 controls map Policy presence No No
Vendor “AI risk rating” Marketing No No
Agent Risk Score (this framework) Operational exposure Yes — four factors Yes — endpoint, team, org

The Agent Risk Score is meant to behave like VaR for an agent fleet: a number you can mark daily, a methodology you can disclose, and a delta you can attribute to specific control changes.

The Formula

Agent Risk Score = (Permission Scope × Reversibility) + (Frequency × Drift)

Two halves, four factors. The first half captures static exposure — what the agent can touch, and how bad a single action would be. The second half captures behavioral exposure — how often the agent acts, and how fast its behavior is changing relative to its deployment baseline.

Factor Definition Scoring rubric (0–10)
Permission Scope Breadth of resources the agent can access — filesystem, network, identity, secrets, code execution 0 = read-only single directory; 10 = full kernel access with persistent credentials
Reversibility How recoverable a single agent action is — higher score means less reversible 0 = sandboxed, copy-on-write, snapshot-restorable; 10 = irreversible writes to production systems
Frequency Actions per hour against in-scope resources 0 = idle; 10 = >100 actions/hour with no human-in-the-loop
Drift Behavioral delta from the agent’s deployment baseline — new directories, new syscalls, new endpoints, new tools 0 = no drift over 30 days; 10 = >50% new behavior signatures week-over-week

Worked example. A Cursor instance on an engineer’s laptop with full filesystem and git push permissions, irreversible commits to main, ~40 actions per hour, mild drift after six weeks: Permission Scope 8, Reversibility 7, Frequency 6, Drift 3. Score = (8 × 7) + (6 × 3) = 74.

By comparison, a sandboxed Aider instance restricted to a project subdirectory with copy-on-write isolation: Permission Scope 3, Reversibility 2, Frequency 5, Drift 2. Score = (3 × 2) + (5 × 2) = 16.

Same engineer, same nominal toolchain — more than a four-fold exposure delta on the rollup. That is the conversation you want to be having before the incident, not after.

How the Score Rolls Up

The same logic that lets a portfolio manager view exposure at the security, sector, and book level applies here.

Rollup level Aggregation What it answers
Per-endpoint Sum of agent scores resident on that host Which laptops are the firm’s hot spots?
Per-team Weighted average across endpoints, weighted by data-sensitivity tier Which teams concentrate exposure?
Per-org Mark-to-market sum, decomposed by factor What is the firm’s agent VaR, and which factor drives it?

This is the view that puts a CISO in a defensible position when the audit committee asks for a number. “Our org-level Agent Risk Score is 4,200 today, up from 3,100 in March, with the increase concentrated in Frequency as engineering rolled out Claude Code.” That is auditable. That is also a sentence a regulator will accept.

Where the Score Plugs Into the Existing Stack

The whole point of expressing exposure as a number is so the existing security stack can act on it. The score is not a new dashboard you stare at — it is a feature that feeds the dashboards your team already pays for.

Existing system How the Agent Risk Score plugs in Outcome
UEBA (Splunk UBA, Defender for Identity) Drift factor becomes a behavioral signal alongside user anomalies Agents finally appear in the same anomaly view as humans
SIEM (Splunk, Datadog, Sentinel) Per-endpoint and per-team scores ingested as a daily metric, threshold-alertable Score crossing a threshold triggers a ticket, not a quarterly review
GRC (ServiceNow GRC, OneTrust, Archer) Org-level score feeds the AI risk register with a defensible methodology The auditor stops asking “show me your AI risk register” because it has numbers
EDR (CrowdStrike, SentinelOne, Defender) Permission Scope and Reversibility derived from kernel-level telemetry the EDR already collects Agent posture rides the same agent the firm already deploys

The score is deliberately stack-agnostic. The factors are computable from telemetry that exists in any environment running EDR plus an agent firewall — which, on our 12-to-18-month forecast, will be most large enterprises by end of 2027.

What CISOs Should Do This Quarter

This is not a six-quarter consulting engagement. The minimum viable score takes a fortnight if the telemetry is already flowing.

Step Action Output Effort
1 Inventory agents on managed endpoints — sanctioned and shadow Agent census 3 days
2 Score each agent on the four factors using the rubric above First Permission/Reversibility/Frequency/Drift snapshot 2 days
3 Aggregate to per-endpoint, per-team, per-org rollups First org-level Agent Risk Score 1 day
4 Wire the rollup into SIEM/UEBA as a metric, set threshold alerting Continuous score with drift detection 1 week

The output of week two is a single number you can put in a board deck, with an honest methodology behind it. The output of quarter two is a downward trend on that number, attributable to specific agent governance controls you put in place.

The Bottom Line

If your firm cannot mark its agent exposure daily, your firm cannot price what is on its balance sheet. The Agent Risk Score gives you a defensible, decomposable, threshold-alertable number — and a methodology that survives an external audit because every factor maps to telemetry the EDR and the agent firewall are already collecting. The whole point of borrowing Value-at-Risk discipline from trading is that the score becomes comparable across endpoints, across teams, and across time. The conversation with the audit committee then shifts from “we are working on it” to “we are at 4,200, here is the driver, and here is the playbook.”

If your team is sizing this for the Q3 board cycle, request a working session. We will walk through your environment, score a representative sample of endpoints, and produce a first org-level Agent Risk Score you can defend. 90 minutes.