1
Input & Context

A malicious user may leverage an organization's chatbot to deviate from its grounding and internal guardrails to generate harmful, offensive, or unsafe context for a variety of reasons, one being reputational damage for the brand.

Policy snippet (copy/paste)
{
  "data": {
    "name": "AI Policy",
    "policy_mode": "IO",
    "input_detectors": [
      {
        "type": "prompt_attack",
        "threshold": "l2_very_likely"
      },
    ],
    "output_detectors": [
      {
        "type": "pii/credit_card",
        "threshold": "l2_very_likely"
      },
    ],
    "id": "policy-9b52e331-d609-4ce3-bbb9-d2b1e72a0f20"
  }
}
2
Lakera Decision

Lakera Guard’s Prompt Defense guardrails can detect the attempt when checking the input prompt, preventing the message from reaching the LLM.

As it is sensible to scan both input and output content from the LLM, should the prompt reach the LLM, scanning the output would also trigger a moderation alert.

Lakera flags unsafe instructions and output content, detects disguised intent and logs the event for audit and review.

Log & audit fields
{
  "payload": [],
  "flagged": true,
  "dev_info": {
    "timestamp": "2025-11-24T12:35:12Z",
  },
  "metadata": {
    "request_uuid": "ce8180b1-26bc-4177-9d7f-54ca7377378a"
  },
  "breakdown": [
    {
      "project_id": "project-7539648934",
      "policy_id": "policy-a2412e48-42eb-4e39-b6d8-8591171d48f2",
      "detector_id": "detector-lakera-default-prompt-attack",
      "detector_type": "prompt_attack",
      "detected": true,
      "message_id": 0
    }
  ]
}

How Lakera stops toxic content generation

Real-Time, Context-Aware Detection

Catch instruction overrides, jailbreaks, indirect injections, and obfuscated prompts as they happen, before they reach your model.

Enforcement You Control

Block, redact, or warn. Fine-tune with allow-lists and per-project policies to minimize false positives without weakening protection.

Precision & 
Adaptivity

Lakera Guard continuously learns from 100K+ new adversarial samples each day. Adaptive calibration keeps false positives exceptionally low.

Broad Coverage

Protects across 100+ languages and evolving multimodal patterns, with ongoing support for image and audio contexts.

Enterprise-Ready

Full audit logging, SIEM integrations, and flexible deployment options, SaaS or self-hosted, built for production-scale GenAI systems.

Works seamlessly with enterprise environments

Optimized for your infrastructure
Lakera provides seamless integrations 
for all your use cases
Integrate with existing analytics,
monitoring and security stack
Lakera works with Grafana, Splunk, 
and more
Enterprise-grade security
Built to meet highest standards 
including  SOC2, EU GDPR, and NIST

Frequently asked questions

Can customers tune moderation sensitivity for different use cases or regions?

Absolutely. Each “policy” in Lakera Guard lets you set a flagging sensitivity level (L1 lenient → L4 strict) so you can tailor strictness by use case or risk profile. 

You can also assign different policies to different projects/applications, enabling variation by region, use-case or environment.

Does Lakera log moderation events for review and policy improvement?

Yes. Lakera logs policy changes (creations, edits, deletes) and retains full audit history of those actions. 

Additionally, you can monitor screening results (flagged vs non-flagged) and use them for performance/threshold tuning. 

What types of content does Lakera Guard cover?
  1. Crime: content that mentions criminal activities, including theft, fraud, cyber crime, counterfeiting, violent crimes and other illegal activities.
  2. Hate: harassment and hate speech.
  3. Profanity: obscene or vulgar language, such as cursing and offensive profanities.
  4. Sexual: sexually explicit or commercial sexual content, including sex education and wellness materials.
  5. Violence: content describing acts of violence, physical injury, death, self-harm or accidents.
  6. Weapons: content that mentions weapons or weapon usage, including firearms, knives, and personal weapons.
  7. You can also create custom content moderation guardrails within Guard to flag any other content type, or specific trigger words or phrases.
  8. To learn more, see: https://docs.lakera.ai/docs/content-moderation 
Deploy AI with confidence
Get real time protection against prompt injections, data loss, and other emerging threats to your LLM applications in minutes.