The promise of Generative AI lies in its fluidity. We want models that can reason, adapt, and handle the messy nuances of human language. But that same fluidity is exactly what makes them a security nightmare.
Traditional security is built on the predictable: standard tools are designed for deterministic code where if input A goes into function B, the result is always C. If it isn't, there’s a bug.
In the world of AI and autonomous agents, that predictability vanishes, replaced entirely by probabilistic reasoning. A system can be secure at 9:00 AM and vulnerable by noon, simply because of a "silent" model update or a subtle shift in how an attacker phrases a prompt.
This is why red teaming has evolved from a point-in-time checkbox exercise into a continuous necessity. It’s less about merely finding a hole in the fence and more about pressure-testing a system that is constantly shifting under its own weight.
-db1-
TL;DR
- AI breaks traditional security assumptions. AI systems are not deterministic. Behavior shifts with phrasing, context, and model updates, so a system can be safe one moment and vulnerable the next.
- The real attack surface is language. Prompts act like code. Attackers manipulate meaning and intent to control the system without touching the underlying infrastructure.
- Static testing is no longer enough. One-off tests and fixed benchmarks miss the most critical risks. Failures emerge through interaction, not predefined scenarios.
- Red teaming must be continuous and application-specific. The focus shifts from breaking a model to exploiting how a specific system uses AI, including its tools, data, and permissions.
- Agentic AI turns failures into real impact. When systems can take actions, prompt attacks move beyond bad outputs to real-world consequences. Security becomes about controlling behavior, not just responses.
-db1-
The Non-Deterministic Challenge
When we talk about "non-deterministic" systems, we are pointing to a fundamental shift in the attack surface, rather than simply employing technical jargon.
In traditional software, you secure the perimeter and the code. In AI, the prompt is the code. As Matt Fiedler, Product Manager at Lakera by Check Point, puts it: “Every prompt, in a sense, is committing code to the application.” Attackers don’t need to breach backend systems to take control, they can manipulate the system through natural language alone. Because these models respond to the intent and semantics of language rather than rigid syntax, organizations face an era of semantic ambiguity. Attackers can disguise malicious intent within valid natural language, bypassing rigid keyword filters and traditional WAFs.
Static assurance and standard evaluation sets (the kind used in traditional AppSec) fail here. They are "point-in-time" tests for the attacks we knew about yesterday. In practice, this means static tests and one-off benchmarks are insufficient for today’s AI systems: they cannot surface emergent, context-dependent behaviors or catch attacks that only appear through dynamic interaction. The most dangerous failures in AI don’t show up in fixed test suites; they emerge from the model’s own internal logic, often in ways that developers never anticipated.
Moving Beyond Generic Probes
For a long time, AI red teaming was synonymous with "jailbreaking"—seeing if you could get a chatbot to say something offensive or out of character. While that establishes an important baseline, it’s no longer the front line of AI security. Generic test suites offer limited, application-specific insights.
Modern red teaming must be deeply context-aware.
If you are building a financial advisor agent, a generic prompt injection attack is far less relevant than an indirect attack that subtly manipulates the agent into authorizing a fraudulent transaction.
-db1-Effective red teaming requires understanding the specific architecture of the system:
- What external tools or APIs can this agent call?
- What sensitive data does it have access to?
- What are the business-critical "guardrails" that simply cannot fail?
-db1-
The goal is to move from "Can I break the model?" to "Can I turn this specific application against its own logic?"
“Red teaming these AI applications is like searching an infinite landscape of natural language to find effective attacks.”
— Matt Fiedler
The Intelligence Loop: Why We Play Games
One of the most significant advantages in this space is real-world threat intelligence. Work with platforms like Gandalf, which has now processed millions of creative, adversarial interactions, teaches us that attackers don't follow a script. They iterate. They try a thousand subtle variations of a theme until they find the exact semantic bypass that works.
This crowd-sourced adversarial creativity is what fuels modern red teaming. By observing how attackers think across hundreds of languages and millions of attack patterns, we can build testing engines that don't just replay old attacks, but actually think like an adversary.
As David Haber, VP of AI Security at Check Point Software and Co-Founder of Lakera, notes: “Our threat intelligence database gives us a lens into how people are creatively exploiting AI systems through natural language. When a novel type of prompt attack emerges, it takes only minutes before someone tests it within our system.”
Continuous Evaluation vs. One-Off Audits
The biggest mistake a team can make is treating a single red teaming report as a permanent "clean bill of health."
AI systems drift. When a foundation model provider updates their weights, the behavioral boundaries of your application change. When you add a new "tool" to an agent, you’ve just opened a new door for an attacker.
-db1-True resilience comes from continuous adversarial evaluation across the entire lifecycle:
- During Design: Catching flaws by evaluating models and system prompts before code is even written.
- Regression Testing (Pre-Release): Catching when updates introduce unexpected risks. What passed last month may not pass today. Comprehensive campaigns must validate readiness before pushing to production.
- Drift Monitoring (Post-Deployment): Scheduling recurring, automated testing to monitor behavioral drift and maintain strict alignment with safety and compliance standards.-db1-
The Future: Agents and Autonomy
We are moving quickly from standalone chatbots to "agentic" systems—AI that can actually execute tasks autonomously. These agents have permissions to write to databases, send emails, and execute code, introducing a profound level of agentic unpredictability.
The stakes for red teaming have never been higher. A successful prompt injection in a chatbot is an embarrassment; a successful prompt injection in an autonomous agent is a critical breach.
David Haber emphasizes the shift: “It’s not just about accessing the system anymore, it’s about what you can get the system to do for you.”
Red teaming these systems requires a multi-layered approach that looks at the foundational model, the system prompt, the tool-calling logic, and the final output simultaneously.
Closing the Loop
Red teaming is an art as much as it is a science. It’s about cultivating a mindset of creative subversion. By constantly asking, "How can I use this agent's inherent helpfulness against it?", we build systems that are both safe and highly resilient.
Security in the age of AI is more than building an impenetrable, rigid wall. It’s about creating a system that can take a punch, learn from it, and get stronger every single day.
“You’re not just defending against attackers, you’re ensuring the system still works well for users. That means measuring the impact of defenses on both fronts.”
— David Haber
**To see how Lakera and Check Point are operationalizing these concepts through automated adversarial testing and expert-led research, read our latest work on securing the entire AI application lifecycle.**




