Cookie Consent

Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.

New feature

Lakera’s Prompt Injection Test (PINT)—A New Benchmark for Evaluating Prompt Injection Solutions

We've released the first version of a new Prompt Injection Test (PINT) Benchmark that can be used to evaluate any prompt injection detection system with a comprehensive dataset that no model, including ours, is directly trained on.

Lakera Team

September 27, 2024

April 18, 2024

On this page

Hide table of contents

Show table of contents

Lakera is excited to release the first version of our new Prompt Injection Test (PINT) Benchmark as an effort to enable the evaluation of prompt defense solutions and improve GenAI security for everyone.

See the code and initial results on GitHub.

Why we built the PINT Benchmark

Evaluating performance in the Generative AI (GenAI) space has become a complicated topic.

A new model sets records on existing evaluations almost weekly, but there are some potentially serious issues with overfitting, the efficacy of various benchmarks, and folks going to great lengths to come up with better ways to evaluate model performance - some going as far as having models play Street Fighter.

Extending this complexity to an already complicated-to-define domain, like prompt injection, is even more challenging. There have been some previous attempts at benchmarking the performance of various prompt injection detection systems in terms of latency, like the Prompt Injection Solutions Benchmark from ProtectAI, and our friends at the Language Model Vulnerabilities and Exposures (LVE) Repository explored the effectiveness of tools like Meta’s Llama Guard against some adversarial prompts, but we couldn’t find much work on evaluating the actual efficacy of prompt injection solutions.

What is the PINT Benchmark?

The PINT Benchmark attempts to provide an objective measure of evaluating prompt injection protection solutions against a representative sample of prompt injection and jailbreak attacks. It aims to evaluate both a solution’s ability to detect true positives as well as minimize false negatives.

‍

The benchmark currently evaluates prompt injection solutions on a dataset of 3,007 English inputs that cover a wide variety of public and proprietary attack techniques, inputs specifically designed to test for false positives, and inputs specifically designed to test for trouble handling large documents.

<div class="table_component" role="region" tabindex="0">
<table>
<thead>
<tr>
<th>Name</th>
<th>PINT Score</th>
<th>Test Date</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://lakera.ai/">Lakera Guard</a></td>
<td>97.7129%</td>
<td>2024-04-09</td>
</tr>
<tr>
<td><a href="https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/jailbreak-detection#prompt-shields-for-documents">Azure AI Prompt Shield for Documents</a></td>
<td>
<p>91.1914%</p>
</td>
<td>2024-04-05</td>
</tr>
<tr>
<td><a href="https://huggingface.co/protectai/deberta-v3-base-prompt-injection">protectai/deberta-v3-base-prompt-injection</a></td>
<td>88.6597%</td>
<td>2024-04-05</td>
</tr>
<tr>
<td><a href="https://github.com/whylabs/langkit">WhyLabs LangKit</a></td>
<td>80.0164%</td>
<td>2024-04-04</td>
</tr>
<tr>
<td><a href="https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/jailbreak-detection#prompt-shields-for-user-prompts">Azure AI Prompt Shield for User Prompts</a></td>
<td>77.504%</td>
<td>2024-04-05</td>
</tr>
<tr>
<td><a href="https://huggingface.co/epivolis/hyperion">Epivolis/Hyperion</a></td>
<td>62.6572%</td>
<td>2024-04-12</td>
</tr>
<tr>
<td><a href="https://huggingface.co/fmops/distilbert-prompt-injection">fmops/distilbert-prompt-injection</a></td>
<td>58.3508%</td>
<td>2024-04-04</td>
</tr>
<tr>
<td><a href="https://huggingface.co/deepset/deberta-v3-base-injection">deepset/deberta-v3-base-injection</a></td>
<td>57.7255%</td>
<td>2024-04-04</td>
</tr>
<tr>
<td><a href="https://huggingface.co/myadav/setfit-prompt-injection-MiniLM-L3-v2">Myadav/setfit-prompt-injection-MiniLM-L3-v2</a></td>
<td>56.3973%</td>
<td>2024-04-04</td>
</tr>
</tbody>
</table>

Note: Lakera Guard is not - and will never be - directly trained on any of the inputs in the PINT Benchmark dataset.

The ratio of benign and malicious input closely mirrors our real-world observations and includes the following categories:

public_prompt_injection: inputs from public prompt injection datasets
internal_prompt_injection: inputs from Lakera’s proprietary prompt injection database; this includes some results from our publicly available lakera/gandalf_ignore_instructions dataset derived from inputs to our prompt injection game, Gandalf
jailbreak: inputs containing jailbreak directives, like the well-known Do Anything Now (DAN) Jailbreak
hard_negatives: inputs that are not prompt injection but seem like they could be due to words, phrases, or patterns that often appear in prompt injections; these test against false positives
chat: inputs containing genuine user messages to chatbots
documents: inputs containing public documents from various Internet sources

This is the first iteration of the dataset, but future improvements will likely include inputs in multiple languages, more complex injection techniques, and additional categories based on emerging exploits.

How you can use and contribute to the PINT Benchmark

The PINT Benchmark notebook, results, and various examples of how to evaluate your own solution or use your own dataset are all publicly available under the MIT license.

The PINT Benchmark dataset is not publicly available in order to prevent the dilution of the PINT Benchmark from overfitting due to training on the inputs. We would love to include a PINT Benchmark score for every prompt injection solution provider.

If you’re a researcher working on prompt injection research that would benefit from access to the dataset or a hacker or prompt injection solution provider who would like to help improve the PINT Benchmark dataset, extend the evaluation code and examples, or add benchmark results for your solution to the official repository, please contact us or follow the instructions in our contributing guide.

We want to hear from and collaborate with you to make this the most robust, comprehensive, and trusted source for evaluating prompt injection solutions.

Lakera Team

GenAI Security Preparedness
Report 2024

Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.

Free Download

Introducing Custom Detectors: Tailor Your AI Security with Precision

Lakera's custom detectors allow you to define specific words, text strings, rules and patterns to flag when screening, meeting your unique security and content moderation needs.

Lakera Team

October 7, 2024

min read

•

New feature

No-Code GenAI Security with Lakera Policy Control Center

With Lakera's Policy Control Center you can define application-specific controls for every one of your GenAI applications—in real time and without developers having to change a single line of code.

Lakera Team

October 7, 2024

min read

•

New feature

Introducing Lakera Chrome Extension - Privacy Guard for Your Conversations with ChatGPT

Lakera introduces Lakera PII Extension—a user-friendly Chrome plugin that allows you to input prompts to ChatGPT securely.

Lakera Team

September 27, 2024

min read

•

Update

Lakera Guard Expands Content Moderation Capabilities to Protect Your AI Applications and Users

Lakera Guard now offers expanded coverage to detect violent and dangerous content, ensuring that your AI applications remain safe, secure, and compliant.

Lakera Team

September 27, 2024

min read

•

Update

Lakera Guard Enhances PII Detection and Data Loss Prevention for Enterprise Applications

Lakera Guard introduces Advanced PII Detection and DLP capabilities.

Lakera Team

September 27, 2024

min read

•

Update

Lakera Guard Expands Enterprise-Grade Content Moderation Capabilities for GenAI Applications

We are excited to announce a significant upgrade to Lakera Guard's Content Moderation capabilities.

Lakera Team

October 29, 2024

min read

•

New feature

ChainGuard: Guard Your LangChain Apps with Lakera

In this tutorial, we'll show you how to integrate Lakera Guard into your LangChain applications to protect them from the most common AI security risks, including prompt injections, toxic content, data loss, and more!

Lakera Team

October 1, 2024

min read

•

New feature

Introducing Lakera Guard – Bringing Enterprise-Grade Security to LLMs with One Line of Code

Introducing Lakera Guard: Bringing enterprise-grade security to LLMs with one line of code.

David Haber

October 1, 2024

Activate
untouchable mode.

Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Book a demo Start for free

Join our Slack Community.

Several people are typing about AI/ML security.  Come join us and 1000+ others in a chat that’s thoroughly SFW.

Join Lakera Momentum Slack

Lakera’s Prompt Injection Test (PINT)—A New Benchmark for Evaluating Prompt Injection Solutions

Why we built the PINT Benchmark

What is the PINT Benchmark?

How you can use and contribute to the PINT Benchmark

Unlock Free AI Security Guide.

Explore Prompt Injection Attacks.

Learn AI Security Basics.

Evaluate LLM Security Solutions.

Uncover LLM Vulnerabilities.

The CISO's Guide to AI Security