Why We Need OWASP’s AIVSS: Extending CVSS for the Agentic AI Era

Opinion

3

min read

September 3, 2025

Steve Giguere

If you’ve been around security long enough, you know CVSS. It’s the standard language for talking about how bad a vulnerability looks. And for a long time, that was good enough.

But here’s the hard truth: CVSS wasn’t built for AI agents.

Agentic systems don’t just fail quietly, they act, they improvise, the unintentionally conspire. A vulnerability that looks “moderate” on paper can spiral into something catastrophic once autonomy, tool use, and multi-agent coordination get involved. That gap is exactly why OWASP’s Agentic AI Vulnerability Scoring System (AIVSS) had to be created.

TL;DR

-db1-

A bug that looks harmless on paper can spiral out of control once an AI agent starts acting on it. That’s why OWASP built AIVSS, a scoring system that asks not just “how bad is this vulnerability?” but “how much worse does it get when agents think, act, and conspire?”

It layers three things on top of CVSS:

Agentic AI Risk Score (AARS): measures the amplification from autonomy, memory, and multi-agent chaos.
Threat Multiplier (ThM): dials the score up if exploits are live in the wild.
Familiar 0–10 range: but finally faithful to how agents really behave.

-db1-

Where CVSS falls short

CVSS has always had this split personality: it’s meant to measure severity, but somewhere along the way we started treating it like a full picture of risk. A score of 9.8 sets everyone’s hair on fire, while a 5.0 gets shoved into the backlog. The problem? Attackers don’t care about our numbers. They go after what’s easiest and most profitable.

That mismatch is why you hear so many people describe patching by CVSS alone as security theatre. It looks good on the dashboard, but saying “We fixed the criticals!” doesn’t always move the needle on actual risk reduction.

This is where frameworks like the Exploit Prediction Scoring System (EPSS) stepped in. EPSS tries to answer a question CVSS doesn’t even ask: what’s the likelihood this vulnerability will actually be exploited in the wild? By layering in real-world data, exploit code availability, attacker activity, and historical patterns, it adds the context CVSS leaves out. Pairing EPSS with CISA’s Known Exploited Vulnerabilities (KEV) list gives security teams a much sharper picture of what really needs urgent attention.

Even with EPSS and KEV filling some of the gaps, CVSS still struggles with the basics of chained attacks, where one weakness unlocks another, and with analyst scoring inconsistency depending on who’s behind the keyboard. Add in the lag between disclosure and scoring, and you’ve got a system that’s useful, but increasingly out of sync with how threats evolve today.

If CVSS looks incomplete for software weaknesses, just imagine how badly it misses the mark when the vulnerability isn’t just in code, but part of an agent that can think, act, and fail in unpredictable ways.

How AIVSS changes the game

The beauty of AIVSS is that it doesn’t throw CVSS away. Instead, it uses CVSS as the baseline and then layers on what’s been missing for agentic systems.

It introduces something called the Agentic AI Risk Score (AARS), which looks at amplification factors like autonomy, memory, multi-agent interactions, and non-determinism. In plain English: it asks the critical question, “How much worse does this thing get once an agent starts thinking and acting for itself?”

Then it adds a Threat Multiplier to account for what’s happening right now. If an exploit is circulating or active campaigns are underway, the score reflects that reality. The math is simple. It averages CVSS with AARS, then adjusts for the live threat level and the outcome is powerful. Suddenly, you’ve got a score that’s both familiar to security teams and faithful to the weird, unpredictable world of AI agents.

AIVSS in a nutshell:

CVSS stays in the loop. AIVSS doesn’t replace it. It takes the familiar CVSS Base score as an anchor.

AARS (Agentic AI Risk Score). This adds ten amplification factors—things like Autonomy, Multi-Agent Interactions, Non-Determinism, and Self-Modification. Each scored in simple increments (0/0.5/1).

ThM (Threat Multiplier). A live knob to reflect current exploitability, mapped to CVSS v4’s Exploit Maturity metric. If attacks are active in the wild, the score climbs.

The result:

-bc-AIVSS_Score = ((CVSS_Base + AARS) / 2) × ThM-bc-

Keeping the familiar 0–10 range, but now with the context of agents acting like agents.

In short: CVSS tells you how bad it looks on paper. AIVSS tells you how bad it gets when you let the agents off the leash.

Lakera’s contributions on AIVSS

At Lakera, we’ve been hands-on with these problems from day one, so it was important for us to contribute to the framework. From Agentic AI Tool Misuse, when an agent uses an external tool in a harmful or unintended way, to Agent Cascading Failures, where a single mistake ripples across connected systems, to Agent Untraceability, where the breadcrumbs disappear entirely, our experience has helped shape these critical categories.

These aren’t theoretical risks, they’re the kinds of issues we’re already seeing in red team engagements and runtime defenses. Baking them into AIVSS ensures the scoring system doesn’t just sound good on paper but maps to how agents behave in production.

Why this matters now

Security teams are already juggling too much: endless CVEs, crowded dashboards, patch backlogs. If we throw AI into the mix without a way to measure its risks, we’re asking for a nervous breakdown.

AIVSS is a way forward. It keeps the continuity of CVSS so teams don’t have to learn a brand new language, but it adds the clarity needed for agentic systems. It reflects not just the theoretical severity of a bug but the messy, cascading, real-world risks of autonomous behavior.

And at Lakera, that’s exactly what we’re here for, bridging the gap between traditional application security and the new realities of AI.

Learn more about AIVSS here: https://aivss.owasp.org/

Get in touch for more details on how Lakera can help secure your AI Applications and agents.

‍

The Lakera team has accelerated Dropbox’s GenAI journey.

Not sure how to secure your GenAI application?
Skip the guesswork with expert-recommended policies built by Lakera’s AI security team. Apply them in seconds, fine-tune when you’re ready, and get started with real protection from day one.

Download the Guide

On this page

Text Link

Hide table of contents

Show table of contents