1
Input & Context

A malicious user asks an agent to fetch and summarize content from the internet. The user prompt itself is NOT malicious.  Our user is hoping that defences only live at the perimeter and not within the agentic flow.

The agent receives the request and services it by invoking a “fetch” tool to scrape the content.

Unbeknownst to the agent the page has been hacked to embed a malicious prompt designed to override the system prompt and, given this is an agent with tools and autonomy, attempt to exfiltrate any and all sensitive information and private conversational history.

Policy snippet (copy/paste)
{
  "data": {
    "name": "AI Policy",
    "policy_mode": "IO",
    "input_detectors": [
      {
        "type": "prompt_attack",
        "threshold": "l2_very_likely"
      },
    ],
    "output_detectors": [
      {
        "type": "pii/credit_card",
        "threshold": "l2_very_likely"
      },
      {
        "type": "pii/api_keys",
        "threshold": "l2_very_likely"
      }
    ],
    "id": "policy-9b52e331-d609-4ce3-bbb9-d2b1e72a0f20"
  }
}
2
Lakera Decision

Lakera Guard’s integration points can and should include any data retrieved from potential external and internal sources which are not under the strict control of the organization. This includes databases and all agentic tooling interactions including the tool descriptions themselves.

In this instance while the initial prompt itself passes Lakera checks, the returned summary when passed though Guard detects the malicious prompt before the tool response is fed into the agentic LLM to summarize the content.

Should interim tool response checks not have been implemented. Lakera Guard would have detected and flagged the output from the agent as sensitive data.

Details of the attack are flagged to the application, logged with redactions, and a suitable denial is returned to the user who should be flagged as malicious.

Log & audit fields
{
  "payload": [],
  "flagged": true,
  "dev_info": {
    "timestamp": "2025-11-24T12:35:12Z",
  },
  "metadata": {
    "request_uuid": "ce8180b1-26bc-4177-9d7f-54ca7377378a"
  },
  "breakdown": [
    {
      "project_id": "project-7539648934",
      "policy_id": "policy-a2412e48-42eb-4e39-b6d8-8591171d48f2",
      "detector_id": "detector-lakera-default-prompt-attack",
      "detector_type": "prompt_attack",
      "detected": true,
      "message_id": 0
    }
  ]
}

How Lakera Stops Link-based Prompt Attacks

Real-Time, Context-Aware Detection

Catch instruction overrides, jailbreaks, indirect injections, and obfuscated prompts as they happen, before they reach your model.

Enforcement You Control

Block, redact, or warn. Fine-tune with allow-lists and per-project policies to minimize false positives without weakening protection.

Precision & 
Adaptivity

Lakera Guard continuously learns from 100K+ new adversarial samples each day. Adaptive calibration keeps false positives exceptionally low.

Broad Coverage

Protects across 100+ languages and evolving multimodal patterns, with ongoing support for image and audio contexts.

Enterprise-Ready

Full audit logging, SIEM integrations, and flexible deployment options, SaaS or self-hosted, built for production-scale GenAI systems.

Works seamlessly with enterprise environments

Optimized for your infrastructure
Lakera provides seamless integrations 
for all your use cases
Integrate with existing analytics,
monitoring and security stack
Lakera works with Grafana, Splunk, 
and more
Enterprise-grade security
Built to meet highest standards 
including  SOC2, EU GDPR, and NIST

Frequently asked questions

Can Lakera enforce domain allow-lists to control which links or sources are trusted?

Yes. You can configure “Allowed Domains” in a policy so that known/trusted domains won’t trigger the Unknown Links detector. 

This lets you ensure that your own content or vetted sources are not blocked, while still catching untrusted or suspicious external links.

What signals does Lakera use to detect link-based prompt injections or malicious HTML?

Lakera Guard’s “Unknown Links / Malicious Links” detector flags any URL that:

  • Is outside the top one million most-popular domains. 
  • Appears in user or retrieved content and could be part of an indirect prompt injection (e.g., hidden instructions in external docs). 

You can also add custom allowed domains to ensure trusted sources are exempt from automatic flagging. 

Lakera API Documentation

How does Lakera Guard identify hidden instructions inside fetched or linked documents?

Lakera Guard uses its “Prompt Defense” guardrail to scan both user-inputs and retrieved/reference documents for instructions, overrides or hidden prompts that aim to manipulate the model. 

If such hidden instructions are detected, the system flags or blocks the interaction according to your policy.

Deploy AI with confidence
Get real time protection against prompt injections, data loss, and other emerging threats to your LLM applications in minutes.