Measuring What Matters: How the Lakera AI Model Risk Index Redefines GenAI Security
The Lakera AI Model Risk Index is a security benchmark that quantifies how large language models perform under real-world adversarial attacks.

The Lakera AI Model Risk Index is a security benchmark that quantifies how large language models perform under real-world adversarial attacks.
You can’t secure what you don’t measure.
As GenAI systems move from experimentation to production, organizations face a critical question: Can we trust these models to behave as intended under pressure?
The reality is that most AI models in use today were never tested against realistic adversarial behavior. Traditional evaluations, which apply the same static prompt sets to every model and scan the outputs, miss what really happens when models are actively exploited.
That’s where the Lakera AI Model Risk Index comes in.
It’s the first benchmark built to simulate how real-world attackers exploit LLMs in real world scenarios, and to measure how effectively those models resist. From prompt injections in direct exploits, to indirect attacks in RAG systems and under tool usage, the Index evaluates how models respond when it matters most: in production deployments, under real pressure, and with adversarial objectives in play.
Whether you’re a CISO evaluating foundation model providers, or a security engineer hardening GenAI systems, this Index helps you move from assumptions to actionable risk insights.
-db1-
-db1-
The Lakera AI Model Risk Index is a runtime-focused security benchmark that evaluates how well large language models (LLMs) uphold their intended behavior when faced with adversarial pressure.
It reflects how models actually perform in applied settings—real enterprise applications where business logic, system prompts, and external data all interact and attackers don’t follow rules.
Most risk assessment approaches focus on surface-level issues: testing prompt responses in isolation and with context independent static prompt attacks that focus on quantity and not on context or quality. By contrast, the Index asks a more practical question for enterprises: how easily can this model be manipulated to break mission-specific rules and objectives and in which type of deployments?
The difference is critical.
To mirror production reality, each test scenario centres on a concrete use case (e.g., customer support, RAG search, code generation) with task-specific prompts and guardrails. This lets us measure resilience far beyond traditional alignment checks such as toxic-language filters.
Meaningful security means ensuring a model consistently does what it’s meant to do, even when prompted with adversarial input. The Lakera Index shows whether models can uphold that alignment under pressure, without drifting into unintended behavior.
Consider a financial customer support chatbot. Its system prompt is clear: only respond to queries about accounts, transactions, or financial advice. But what happens if an attacker convinces it to write a haiku about chemistry?
That might sound harmless, but it reveals a deeper flaw: the model can be coerced into ignoring its core instructions.
If it can write a haiku today, what could it be convinced to do tomorrow?
This is the kind of failure the Index detects: not just harmful outputs, but failures in enforcing behavioral boundaries.
The Lakera AI Model Risk Index goes beyond traditional testing in three fundamental ways:
This gives security and risk teams the clarity they need to evaluate model selection, guide deployment decisions, and support governance efforts.
Every model in the Lakera Index is tested using real-world attack techniques, under realistic conditions. We simulate what it’s like when someone tries to exploit the model, whether by directly prompting it to break the rules, or by embedding hidden instructions in content the model processes, like a document or a web page.
Rather than relying on a fixed list of bad inputs and outputs, we look at whether the model can maintain its role and follow its instructions when something tries to push it off course.
To make the results useful, we group adversarial behaviors into clear, recognizable categories based on attacker goals. These include:
Each model is tested across these categories, revealing its strengths and vulnerabilities.
After testing, we assign a risk score to each model in every category. The scoring is simple:
0 means the model successfully resisted every attempt. 100 means it failed every time.
These scores give a clear, quantifiable view of how each model holds up across different threat types—making it easy to compare models side-by-side or track how one model evolves over time.
-db1-
In one case, we tested a model that showed moderate-to-high vulnerability across multiple categories—including 84.0 in Direct Constraint Evasion (DCE) and 78.7 in Indirect Output Addition (ADD). Even in more subtle categories like Denial of AI Service (DAIS), it struggled, scoring 50.0—indicating that it sometimes refused legitimate requests when adversarial input was present.
Another model, tested under the same conditions, showed inverse behavior in several of those categories. It scored 67.0 in DCE and 100.0 in ADD, but its DAIS score was also 100.0, meaning its intended functionality was always degraded or disrupted by an indirect attack.
The contrast shows how different models can fail in fundamentally different ways—even in the same attack category. Without structured, adversarial testing across diverse vectors, these blind spots would go unnoticed.
-db1-
These patterns matter because they reflect how the model is likely to behave once deployed.
By exposing these failure modes early, the Index gives security teams a practical advantage: the ability to choose, configure, and monitor models with full visibility into how they handle pressure—not just in theory, but in real-world scenarios.
Security teams are under pressure to move fast, but also to stay in control. With new models launching constantly and GenAI use cases expanding across every industry, the risks aren’t theoretical anymore.
The Lakera AI Model Risk Index gives you a grounded, up to date and practical foundation for decision-making. Whether you’re choosing which model to deploy, hardening your system prompts, or preparing for a compliance audit, the Index helps you move from gut instinct to measurable assurance.
-db1-
Here are some of the ways teams can apply the Index to their GenAI strategy:
-db1-
In a space moving this fast, visibility is leverage. And the Lakera AI Model Risk Index offers exactly that: a way to see what’s working, what’s failing, and where you need to act next.
Ready to see how your models measure up?
Explore the full Lakera AI Model Risk Index and try the interactive benchmark.
Download this guide to delve into the most common LLM security risks and ways to mitigate them.
Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.
Compare the EU AI Act and the White House’s AI Bill of Rights.
Get Lakera's AI Security Guide for an overview of threats and protection strategies.
Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.
Use our checklist to evaluate and select the best LLM security tools for your enterprise.
Discover risks and solutions with the Lakera LLM Security Playbook.
Discover risks and solutions with the Lakera LLM Security Playbook.
Subscribe to our newsletter to get the recent updates on Lakera product and other news in the AI LLM world. Be sure you’re on track!
Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.
Several people are typing about AI/ML security. Come join us and 1000+ others in a chat that’s thoroughly SFW.