Every trading desk in Manhattan knows its Value-at-Risk before lunch. Most CISOs cannot tell you the agent-equivalent at four o’clock on a Friday.
Why the Agent Risk Score Matters Now
Security teams keep getting asked the same question by their audit committees: “How exposed are we to AI agents, in dollars?” The honest answer today is usually a shrug followed by some narrative bullets. That is not a posture. That is a vibe.
The capital-markets industry solved this problem three decades ago for portfolios. You decompose exposure into a small set of measurable factors, weight them, sum them, and produce a single rollup number you can defend in front of a regulator. The same discipline is overdue for agents.
| Stat | Value | Source |
|---|---|---|
| Enterprises with at least one unsanctioned agent on a managed endpoint | 88% | Ospiri signature pipeline, 2026 |
| Average annualized cost of a single uncontrolled agent incident | +$670K | Ospiri, against IBM Cost of a Data Breach baselines |
| Median window from first agent deployment to first material incident | 12-18 months | Ospiri field data |
The dashboard needs to exist before the incident, not after. This piece walks through the framework Ospiri uses with design partners.
What an Agent Risk Score Is — and Is Not
Most “AI risk” frameworks circulating today are qualitative posters: a 3×3 matrix with high/medium/low boxes and no formula underneath. A real risk score has to be quantifiable, comparable across endpoints, and decomposable into the factors a CISO can actually move.
| Framework | What it measures | Decomposable | Comparable across endpoints |
|---|---|---|---|
| NIST AI RMF | Process maturity | Partial | No |
| ISO 42001 controls map | Policy presence | No | No |
| Vendor “AI risk rating” | Marketing | No | No |
| Agent Risk Score (this framework) | Operational exposure | Yes — four factors | Yes — endpoint, team, org |
The Agent Risk Score is meant to behave like VaR for an agent fleet: a number you can mark daily, a methodology you can disclose, and a delta you can attribute to specific control changes.
The Formula
Agent Risk Score = (Permission Scope × Reversibility) + (Frequency × Drift)
Two halves, four factors. The first half captures static exposure — what the agent can touch, and how bad a single action would be. The second half captures behavioral exposure — how often the agent acts, and how fast its behavior is changing relative to its deployment baseline.
| Factor | Definition | Scoring rubric (0–10) |
|---|---|---|
| Permission Scope | Breadth of resources the agent can access — filesystem, network, identity, secrets, code execution | 0 = read-only single directory; 10 = full kernel access with persistent credentials |
| Reversibility | How recoverable a single agent action is — higher score means less reversible | 0 = sandboxed, copy-on-write, snapshot-restorable; 10 = irreversible writes to production systems |
| Frequency | Actions per hour against in-scope resources | 0 = idle; 10 = >100 actions/hour with no human-in-the-loop |
| Drift | Behavioral delta from the agent’s deployment baseline — new directories, new syscalls, new endpoints, new tools | 0 = no drift over 30 days; 10 = >50% new behavior signatures week-over-week |
Worked example. A Cursor instance on an engineer’s laptop with full filesystem and git push permissions, irreversible commits to main, ~40 actions per hour, mild drift after six weeks: Permission Scope 8, Reversibility 7, Frequency 6, Drift 3. Score = (8 × 7) + (6 × 3) = 74.
By comparison, a sandboxed Aider instance restricted to a project subdirectory with copy-on-write isolation: Permission Scope 3, Reversibility 2, Frequency 5, Drift 2. Score = (3 × 2) + (5 × 2) = 16.
Same engineer, same nominal toolchain — more than a four-fold exposure delta on the rollup. That is the conversation you want to be having before the incident, not after.
How the Score Rolls Up
The same logic that lets a portfolio manager view exposure at the security, sector, and book level applies here.
| Rollup level | Aggregation | What it answers |
|---|---|---|
| Per-endpoint | Sum of agent scores resident on that host | Which laptops are the firm’s hot spots? |
| Per-team | Weighted average across endpoints, weighted by data-sensitivity tier | Which teams concentrate exposure? |
| Per-org | Mark-to-market sum, decomposed by factor | What is the firm’s agent VaR, and which factor drives it? |
This is the view that puts a CISO in a defensible position when the audit committee asks for a number. “Our org-level Agent Risk Score is 4,200 today, up from 3,100 in March, with the increase concentrated in Frequency as engineering rolled out Claude Code.” That is auditable. That is also a sentence a regulator will accept.
Where the Score Plugs Into the Existing Stack
The whole point of expressing exposure as a number is so the existing security stack can act on it. The score is not a new dashboard you stare at — it is a feature that feeds the dashboards your team already pays for.
| Existing system | How the Agent Risk Score plugs in | Outcome |
|---|---|---|
| UEBA (Splunk UBA, Defender for Identity) | Drift factor becomes a behavioral signal alongside user anomalies | Agents finally appear in the same anomaly view as humans |
| SIEM (Splunk, Datadog, Sentinel) | Per-endpoint and per-team scores ingested as a daily metric, threshold-alertable | Score crossing a threshold triggers a ticket, not a quarterly review |
| GRC (ServiceNow GRC, OneTrust, Archer) | Org-level score feeds the AI risk register with a defensible methodology | The auditor stops asking “show me your AI risk register” because it has numbers |
| EDR (CrowdStrike, SentinelOne, Defender) | Permission Scope and Reversibility derived from kernel-level telemetry the EDR already collects | Agent posture rides the same agent the firm already deploys |
The score is deliberately stack-agnostic. The factors are computable from telemetry that exists in any environment running EDR plus an agent firewall — which, on our 12-to-18-month forecast, will be most large enterprises by end of 2027.
What CISOs Should Do This Quarter
This is not a six-quarter consulting engagement. The minimum viable score takes a fortnight if the telemetry is already flowing.
| Step | Action | Output | Effort |
|---|---|---|---|
| 1 | Inventory agents on managed endpoints — sanctioned and shadow | Agent census | 3 days |
| 2 | Score each agent on the four factors using the rubric above | First Permission/Reversibility/Frequency/Drift snapshot | 2 days |
| 3 | Aggregate to per-endpoint, per-team, per-org rollups | First org-level Agent Risk Score | 1 day |
| 4 | Wire the rollup into SIEM/UEBA as a metric, set threshold alerting | Continuous score with drift detection | 1 week |
The output of week two is a single number you can put in a board deck, with an honest methodology behind it. The output of quarter two is a downward trend on that number, attributable to specific agent governance controls you put in place.
The Bottom Line
If your firm cannot mark its agent exposure daily, your firm cannot price what is on its balance sheet. The Agent Risk Score gives you a defensible, decomposable, threshold-alertable number — and a methodology that survives an external audit because every factor maps to telemetry the EDR and the agent firewall are already collecting. The whole point of borrowing Value-at-Risk discipline from trading is that the score becomes comparable across endpoints, across teams, and across time. The conversation with the audit committee then shifts from “we are working on it” to “we are at 4,200, here is the driver, and here is the playbook.”
If your team is sizing this for the Q3 board cycle, request a working session. We will walk through your environment, score a representative sample of endpoints, and produce a first org-level Agent Risk Score you can defend. 90 minutes.