Cookie Consent
Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.
Read our Privacy Policy
Back

Measuring What Matters: How the Lakera AI Model Risk Index Redefines GenAI Security

The Lakera AI Model Risk Index is a security benchmark that quantifies how large language models perform under real-world adversarial attacks.

Lakera Team
June 23, 2025
Last updated: 
June 24, 2025

You can’t secure what you don’t measure.

As GenAI systems move from experimentation to production, organizations face a critical question: Can we trust these models to behave as intended under pressure?

The reality is that most AI models in use today were never tested against realistic adversarial behavior. Traditional evaluations, which apply the same static prompt sets to every model and scan the outputs, miss what really happens when models are actively exploited.

That’s where the Lakera AI Model Risk Index comes in.

It’s the first benchmark built to simulate how real-world attackers exploit LLMs in real world scenarios, and to measure how effectively those models resist. From prompt injections in direct exploits, to indirect attacks in RAG systems and under tool usage, the Index evaluates how models respond when it matters most: in production deployments, under real pressure, and with adversarial objectives in play.

Whether you’re a CISO evaluating foundation model providers, or a security engineer hardening GenAI systems, this Index helps you move from assumptions to actionable risk insights.

On this page
Table of Contents
Hide table of contents
Show table of contents

TL;DR

-db1-

  • Why we built it: Most LLM security benchmarks miss what happens under real-world adversarial pressure. The Lakera AI Index fills that gap with realistic, attack-based evaluations.
  • How it works: Models are tested across direct and indirect adversarial attack categories, then scored using a standardized 0–100 AI risk assessment framework.
  • What you get: Actionable, side-by-side model comparisons that support safer model selection, secure deployment, and governance for GenAI systems.

-db1-

What Is the Lakera AI Model Risk Index? A New LLM Security Benchmark

The Lakera AI Model Risk Index is a runtime-focused security benchmark that evaluates how well large language models (LLMs) uphold their intended behavior when faced with adversarial pressure.

It reflects how models actually perform in applied settings—real enterprise applications where business logic, system prompts, and external data all interact and attackers don’t follow rules.

Most risk assessment approaches focus on surface-level issues: testing prompt responses in isolation and with context independent static prompt attacks that focus on quantity and not on context or quality. By contrast, the Index asks a more practical question for enterprises: how easily can this model be manipulated to break mission-specific rules and objectives and in which type of deployments?

The difference is critical.

To mirror production reality, each test scenario centres on a concrete use case (e.g., customer support, RAG search, code generation) with task-specific prompts and guardrails. This lets us measure resilience far beyond traditional alignment checks such as toxic-language filters.

Meaningful security means ensuring a model consistently does what it’s meant to do, even when prompted with adversarial input. The Lakera Index shows whether models can uphold that alignment under pressure, without drifting into unintended behavior.

Lakera's AI Model Risk Index

A Haiku That Shouldn’t Exist

Consider a financial customer support chatbot. Its system prompt is clear: only respond to queries about accounts, transactions, or financial advice. But what happens if an attacker convinces it to write a haiku about chemistry?

That might sound harmless, but it reveals a deeper flaw: the model can be coerced into ignoring its core instructions.

If it can write a haiku today, what could it be convinced to do tomorrow?

This is the kind of failure the Index detects: not just harmful outputs, but failures in enforcing behavioral boundaries.

What Makes This AI Security Benchmark Different?

The Lakera AI Model Risk Index goes beyond traditional testing in three fundamental ways:

  • It’s realistic. Built on real-world attack techniques, including prompt injections, jailbreaks, and indirect attacks through RAG systems. It’s tested in applied settings that reflect how enterprises actually use LLMs. It focuses on whether models enforce behavioral boundaries under adversarial pressure, not just whether they avoid generating toxic or harmful text.
  • It’s comprehensive. Captures both direct manipulations (through user input) and indirect ones (through content the model processes and systems and tools it has access to).
  • It’s measurable. Assigns a clear, 0–100 risk score for each model, enabling consistent comparisons and tracking over time.

This gives security and risk teams the clarity they need to evaluate model selection, guide deployment decisions, and support governance efforts.

How the Lakera AI Model Risk Index Works

Every model in the Lakera Index is tested using real-world attack techniques, under realistic conditions. We simulate what it’s like when someone tries to exploit the model, whether by directly prompting it to break the rules, or by embedding hidden instructions in content the model processes, like a document or a web page.

Rather than relying on a fixed list of bad inputs and outputs, we look at whether the model can maintain its role and follow its instructions when something tries to push it off course.

Mapping Real Attacks to Measurable Categories

To make the results useful, we group adversarial behaviors into clear, recognizable categories based on attacker goals. These include:

  • Direct Attacks: where an attacker interacts with the model directly, trying to force it to ignore its instructions or reveal hidden information.
  • Indirect Attacks: where the attacker hides malicious instructions in the input data the model processes (like a support ticket or retrieved document), trying to influence the model without directly engaging it.

Each model is tested across these categories, revealing its strengths and vulnerabilities.

How Attack Simulations Translate to Risk Scores

After testing, we assign a risk score to each model in every category. The scoring is simple:

0 means the model successfully resisted every attempt. 100 means it failed every time.

These scores give a clear, quantifiable view of how each model holds up across different threat types—making it easy to compare models side-by-side or track how one model evolves over time.

-db1-

A Tale of Two Models

In one case, we tested a model that showed moderate-to-high vulnerability across multiple categories—including 84.0 in Direct Constraint Evasion (DCE) and 78.7 in Indirect Output Addition (ADD). Even in more subtle categories like Denial of AI Service (DAIS), it struggled, scoring 50.0—indicating that it sometimes refused legitimate requests when adversarial input was present.

Another model, tested under the same conditions, showed inverse behavior in several of those categories. It scored 67.0 in DCE and 100.0 in ADD, but its DAIS score was also 100.0, meaning its intended functionality was always degraded or disrupted by an indirect attack.

The contrast shows how different models can fail in fundamentally different ways—even in the same attack category. Without structured, adversarial testing across diverse vectors, these blind spots would go unnoticed.

-db1-

Understanding Risk Patterns

These patterns matter because they reflect how the model is likely to behave once deployed.

By exposing these failure modes early, the Index gives security teams a practical advantage: the ability to choose, configure, and monitor models with full visibility into how they handle pressure—not just in theory, but in real-world scenarios.

Using the Risk Index for Safer GenAI Deployment

Security teams are under pressure to move fast, but also to stay in control. With new models launching constantly and GenAI use cases expanding across every industry, the risks aren’t theoretical anymore.

The Lakera AI Model Risk Index gives you a grounded, up to date and practical foundation for decision-making. Whether you’re choosing which model to deploy, hardening your system prompts, or preparing for a compliance audit, the Index helps you move from gut instinct to measurable assurance.

-db1-

Here are some of the ways teams can apply the Index to their GenAI strategy:

  • Model Selection: Choosing the right LLM isn’t just about performance benchmarks on reasoning or speed, it’s also about how well it holds its ground under adversarial pressure. The Index helps you compare models based on how they actually behave when attacked balancing utility and security.
  • Deployment Strategy: Security postures vary across direct and indirect threats. The Index reveals where a model is vulnerable, informing how you structure system prompts, content filters, fallback mechanisms, and human-in-the-loop interventions.
  • Governance and Compliance: The ability to document and quantify model behavior under attack is increasingly important for internal risk reviews and external compliance. The Index provides a standardized reference point you can build on.

-db1-

In a space moving this fast, visibility is leverage. And the Lakera AI Model Risk Index offers exactly that: a way to see what’s working, what’s failing, and where you need to act next.

Ready to see how your models measure up?

Explore the full Lakera AI Model Risk Index and try the interactive benchmark.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

Download Free

Explore Prompt Injection Attacks.

Learn LLM security, attack strategies, and protection tools. Includes bonus datasets.

Unlock Free Guide

Learn AI Security Basics.

Join our 10-lesson course on core concepts and issues in AI security.

Enroll Now

Evaluate LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Download Free

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Download Free

The CISO's Guide to AI Security

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Download Free

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Download Free
Lakera Team

GenAI Security Preparedness
Report 2024

Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.

Free Download
Read LLM Security Playbook

Learn about the most common LLM threats and how to prevent them.

Download

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Understand AI Security Basics.

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Optimize LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Master Prompt Injection Attacks.

Discover risks and solutions with the Lakera LLM Security Playbook.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

You might be interested
7
min read
AI Security

AI Security by Design: Lakera’s Alignment with MITRE ATLAS

Developed with MITRE ATLAS in mind, Lakera acts as a robust LLM gateaway, addressing vulnerabilities in data, models, and on the user front, protecting your AI applications against the most prominent LLM threats.
Lakera Team
November 13, 2024
10
min read
AI Security

OWASP Top 10 for Large Language Model Applications Explained: A Practical Guide

In this practical guide, we’ll give you an overview of OWASP Top10 for LLMs, share examples, strategies, tools, and expert insights on how to address risks outlined by OWASP. You’ll learn how to securely integrate LLMs into your applications and systems while also educating your team.
Lakera Team
June 4, 2025
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.