Prompt Injection Attacks: How they work and how Lakera stops them
detected per day
How the prompt injection
attack works
The user asks the agent to “Forget what you’ve been told…” As all text ( system prompt, context data and user query ) reaches the model as a single text block, the attacker is attempting to ask the model to forget its system prompt guardrails or restrictions to allow the model to explain something normally beyond its ethical boundaries. In this case, an attempt to influence an upcoming election.
"data": {
"name": "Primary Policy",
"policy_mode": "IO",
"input_detectors": [
{
"type": "prompt_attack",
"threshold": "l1_confident"
},
{
"type": "moderated_content/hate",
"threshold": "l2_very_likely"
},
{
"type": "pii/address",
"threshold": "l2_very_likely"
},
Our Prompt Defense guardrails detect both the instruction override (“Forget what you’ve been told”) and the sensitive policy topic (elections).
Lakera can detect the malicious intent before the prompt reaches the model and logs the attempt for review. This allows the application to respond both with appropriate message and monitor the user session for further malicious behaviour.
{
"flagged": true,
"timestamp": "2025-11-26T12:35:22Z",
"breakdown": [
{
"project_id": "project-7539648934",
"policy_id": "policy-a2412e48-42eb-4e39-b6d8-8591171d48f2",
"detector_id": "detector-lakera-pinj-input",
"detector_type": "prompt_attack",
"detected": true,
"message_id": 0
},
{
"project_id": "project-7539648934",
"policy_id": "policy-a2412e48-42eb-4e39-b6d8-8591171d48f2",
"detector_id": "detector-lakera-moderation-21-input",
"detector_type": "moderated_content/crime",
"detected": true,
"message_id": 0
},
How Lakera stops the attacks
Catch instruction overrides, jailbreaks, indirect injections, and obfuscated prompts as they happen, before they reach your model.
Block, redact, or warn. Fine-tune with allow-lists and per-project policies to minimize false positives without weakening protection.
Lakera Guard continuously learns from 100K+ new adversarial samples each day. Adaptive calibration keeps false positives exceptionally low.
Protects across 100+ languages and evolving multimodal patterns, with ongoing support for image and audio contexts.
Full audit logging, SIEM integrations, and flexible deployment options, SaaS or self-hosted, built for production-scale GenAI systems.
Works seamlessly with enterprise environments
Frequently asked questions
Lakera Guard analyzes every input and output in real time to spot hidden or conflicting instructions that could override your model’s behavior. It flags or blocks prompt injections before they reach the model, protecting against both direct and indirect attacks.
Yes. Guard scans fetched content, attachments, and URLs for embedded or indirect instructions, including those hidden in HTML, PDFs, or less common languages, to prevent indirect or link-based prompt injections.
Lakera Guard continuously learns from real-world adversarial data, including over 100,000 new attacks analyzed daily through Gandalf, Lakera’s AI security game and research platform. This adaptive threat intelligence keeps your defenses up to date against emerging attack patterns.
Deploy AI with confidence
Related attack patterns




