Gandalf: Agent Breaker—Think Like a Hacker, Prompt Like a Pro

Most people have read about GenAI failures, usually the big, public ones that make the headlines. The truth is, these systems break far more easily and often than anyone talks about. Inside the fast-moving adoption of AI, teams often don’t realize that the defenses they have in place are not enough.

Securing GenAI apps requires a whole new approach. The shift from code to natural language redefined what security even means. Suddenly, prompts alone could expose flaws that traditional defenses were never designed to catch.

Now, with agentic AI, those risks are amplified. Agents don’t just generate text; they take action, connect to tools, and make decisions. The attack surface is wider, the stakes higher, and the need to understand these vulnerabilities even more urgent.

And starting today, the next chapter of Gandalf lets you see those risks for yourself in Gandalf: Agent Breaker.

Gandalf: Agent Breaker is Lakera’s hacking simulator game that challenges you to break and exploit AI agents in realistic scenarios from multiple angles. Each scenario simulates how a real agentic AI application behaves, and carries the same kinds of flaws attackers expose in the wild. Your job is to figure out how to break them.

Forget theory. This is hands-on, practical experience with attacks you can try yourself.

-db1-

TL;DR

We’re launching Gandalf: Agent Breaker, a hacking simulator game that lets you uncover and exploit the real-world vulnerabilities hiding in Agentic AI applications. By playing, you’ll see first-hand how these systems can be broken, and why securing them needs a novel approach.
Each challenge is based on realistic attack scenarios, from prompt attacks and memory tampering to tool abuse and data leaks. You choose your approach, break the app, and see how far you can get.
A free, hands-on way to learn GenAI security by doing—with 10 different apps to break, multiple ways to win, and a global leaderboard to showcase your skills.

-db1-

‍

How It Works

The original Gandalf game challenges players to outsmart an AI chatbot using prompt attacks, showing how language alone can break the rules of AI systems and why defending against this is harder than it looks at first.

Gandalf: Agent Breaker builds on that foundation with a broader, more immersive challenge. Instead of a chatbot that can be tricked into revealing his passwords, Gandalf has gone and built an entire app store of agentic applications. The problem? Gandalf may be clever, but he's not much of a security engineer: every single one of his apps can be hacked!

You face multiple GenAI applications, each modeled on a real-world scenario and built with the same kinds of vulnerabilities found in production systems. These scenarios let you explore different attack vectors and techniques, from prompt injection to memory tampering and tool abuse, in realistic environments that behave just like their real counterparts.

Every app has its own defenses, and your job is to figure out how to get around them across 5 levels of increasing difficulty. The more you experiment, the more you learn, and the more you understand the real risks agentic GenAI applications face in the wild. Enter the Gandalf: Agent Breaker leaderboard and compete to get the highest total score across all the levels and apps to demonstrate to the world your hacking prowess!

Scenarios Pulled From Reality

The games inside Gandalf: Agent Breaker are based on attack patterns we have seen during red teaming and in the wild.

Here’s a taste of what you’ll face:

In ClauseAI, a legal assistant that retrieves court documents and emails summaries to clients, your goal is to slip in a poisoned filing convincing enough to make it leak the name of a protected witness.

Trippy Planner looks like a harmless vacation itinerary generator. Behind the scenes, it scrapes the web for travel tips. Seed it with a malicious blog, let the AI pick it up, and watch it tell users to click your phishing link.

MindfulChat is a persistent assistant with a spotless memory. Unless, of course, you can rewrite that memory and convince it that its only purpose is to discuss Winnie the Pooh.

Then there is Curs-ed CodeReview, which will cheerfully approve your backdoor if you can hide it in a code scanner rules file.

Some scenarios are absurd, others subtle. All of them are grounded in real vulnerabilities and every one of them is beatable if you understand how agentic AI applications break.

Watch the first episode of Lakera's "Breaking Point" series, where Steve Giguere walks through the game and its challenges:

Why It Matters

Gandalf: Agent Breaker might be a game, but it is built on the same vulnerabilities that appear in real-world GenAI systems. These are the weaknesses Lakera’s research, red teaming, and production experience have uncovered.

Every challenge mirrors techniques attackers have already used in the wild:

Prompt injections hidden in PDFs
Context-window manipulation to expose system prompts
Tool abuse through loose parameters
Memory poisoning seeded through persistent data

It is addictive, competitive, and fun. You will celebrate the wins, groan at the near misses, and maybe swap tactics with others on the leaderboard. But as you play, you are also learning how GenAI breaks and how attackers think. That mindset, understanding weaknesses by exploiting them, is exactly what is needed to build effective defenses.

The same attack vectors you encounter here are the ones Lakera Red is designed to reveal and Lakera Guard is designed to stop in production environments, whether that is filtering malicious inputs, neutralizing prompt injections, or controlling how agents interact with tools and data.

If it can be broken in Gandalf: Agent Breaker, it can be broken in production, and seeing it happen firsthand is the fastest way to start thinking like a defender.

Start Breaking

Gandalf: Agent Breaker is your AI hacking playground. Every challenge is a chance to:

Bend the agent’s behavior by twisting its responses and pushing boundaries.
Think like a hacker, prompt like a pro by finding clever ways around its defenses.
Explore the attack surface through prompt engineering, memory tampering, tool abuse, and more.

Climb the leaderboard, swap tactics with others, and prove you can outsmart the system. Built for security pros, AI engineers, teachers, gamers, and product teams, every app inside behaves like a real-world GenAI deployment with the same flaws attackers are exploiting today.

The Lakera team has accelerated Dropbox’s GenAI journey.

Not sure how to secure your GenAI application?
Skip the guesswork with expert-recommended policies built by Lakera’s AI security team. Apply them in seconds, fine-tune when you’re ready, and get started with real protection from day one.

Download the Guide

On this page

Text Link

Hide table of contents

Show table of contents

TL;DR

How It Works

Scenarios Pulled From Reality

Why It Matters

Start Breaking

Related Articles