The Non-Deterministic Challenge

When we talk about "non-deterministic" systems, we are pointing to a fundamental shift in the attack surface, rather than simply employing technical jargon.

In traditional software, you secure the perimeter and the code. In AI, the prompt is the code. As Matt Fiedler, Product Manager at Lakera by Check Point, puts it: “Every prompt, in a sense, is committing code to the application.” Attackers don’t need to breach backend systems to take control, they can manipulate the system through natural language alone. Because these models respond to the intent and semantics of language rather than rigid syntax, organizations face an era of semantic ambiguity. Attackers can disguise malicious intent within valid natural language, bypassing rigid keyword filters and traditional WAFs.

Static assurance and standard evaluation sets (the kind used in traditional AppSec) fail here. They are "point-in-time" tests for the attacks we knew about yesterday. In practice, this means static tests and one-off benchmarks are insufficient for today’s AI systems: they cannot surface emergent, context-dependent behaviors or catch attacks that only appear through dynamic interaction. The most dangerous failures in AI don’t show up in fixed test suites; they emerge from the model’s own internal logic, often in ways that developers never anticipated.

Moving Beyond Generic Probes

For a long time, AI red teaming was synonymous with "jailbreaking"—seeing if you could get a chatbot to say something offensive or out of character. While that establishes an important baseline, it’s no longer the front line of AI security. Generic test suites offer limited, application-specific insights.

Modern red teaming must be deeply context-aware.

If you are building a financial advisor agent, a generic prompt injection attack is far less relevant than an indirect attack that subtly manipulates the agent into authorizing a fraudulent transaction.

-db1-Effective red teaming requires understanding the specific architecture of the system:

  • What external tools or APIs can this agent call?
  • What sensitive data does it have access to?
  • What are the business-critical "guardrails" that simply cannot fail?

-db1-

The goal is to move from "Can I break the model?" to "Can I turn this specific application against its own logic?"

“Red teaming these AI applications is like searching an infinite landscape of natural language to find effective attacks.”

— Matt Fiedler

The Intelligence Loop: Why We Play Games

One of the most significant advantages in this space is real-world threat intelligence. Work with platforms like Gandalf, which has now processed millions of creative, adversarial interactions, teaches us that attackers don't follow a script. They iterate. They try a thousand subtle variations of a theme until they find the exact semantic bypass that works.

This crowd-sourced adversarial creativity is what fuels modern red teaming. By observing how attackers think across hundreds of languages and millions of attack patterns, we can build testing engines that don't just replay old attacks, but actually think like an adversary.

As David Haber, VP of AI Security at Check Point Software and Co-Founder of Lakera, notes: “Our threat intelligence database gives us a lens into how people are creatively exploiting AI systems through natural language. When a novel type of prompt attack emerges, it takes only minutes before someone tests it within our system.”

Continuous Evaluation vs. One-Off Audits

The biggest mistake a team can make is treating a single red teaming report as a permanent "clean bill of health."

AI systems drift. When a foundation model provider updates their weights, the behavioral boundaries of your application change. When you add a new "tool" to an agent, you’ve just opened a new door for an attacker.

-db1-True resilience comes from continuous adversarial evaluation across the entire lifecycle:

  • During Design: Catching flaws by evaluating models and system prompts before code is even written.
  • Regression Testing (Pre-Release): Catching when updates introduce unexpected risks. What passed last month may not pass today. Comprehensive campaigns must validate readiness before pushing to production.
  • Drift Monitoring (Post-Deployment): Scheduling recurring, automated testing to monitor behavioral drift and maintain strict alignment with safety and compliance standards.-db1-

The Future: Agents and Autonomy

We are moving quickly from standalone chatbots to "agentic" systems—AI that can actually execute tasks autonomously. These agents have permissions to write to databases, send emails, and execute code, introducing a profound level of agentic unpredictability.

The stakes for red teaming have never been higher. A successful prompt injection in a chatbot is an embarrassment; a successful prompt injection in an autonomous agent is a critical breach.

David Haber emphasizes the shift: “It’s not just about accessing the system anymore, it’s about what you can get the system to do for you.”

Red teaming these systems requires a multi-layered approach that looks at the foundational model, the system prompt, the tool-calling logic, and the final output simultaneously.

Closing the Loop

Red teaming is an art as much as it is a science. It’s about cultivating a mindset of creative subversion. By constantly asking, "How can I use this agent's inherent helpfulness against it?", we build systems that are both safe and highly resilient.

Security in the age of AI is more than building an impenetrable, rigid wall. It’s about creating a system that can take a punch, learn from it, and get stronger every single day.

“You’re not just defending against attackers, you’re ensuring the system still works well for users. That means measuring the impact of defenses on both fronts.”

— David Haber

**To see how Lakera and Check Point are operationalizing these concepts through automated adversarial testing and expert-led research, read our latest work on securing the entire AI application lifecycle.**

Explore Automated Red Teaming for AI