Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.
The Expanding Attack Surface of Multimodal LLMs and How to Secure It
How attackers are hijacking voice interfaces—and why text-based filters won’t protect your LLM. Discover 4 real-world attacks and how Lakera Guard defends against each, in real time.
Multimodal large language models (LLMs) are reshaping AI interaction. No longer limited to text, today’s models can process audio, images, and video—powering voice assistants, customer support agents, and next-gen multimodal copilots. The result? Breakthrough UX and entirely new product possibilities.
But there's a catch: security.
On this page
Table of Contents
Hide table of contents
Show table of contents
From What You Say to How You Say It
Traditional text-based LLMs have a single attack vector: what the user types. Defenses like input validation, context filters, and prompt shields have matured around that.
Enter voice.
When models need to understand not just what is said, but how, the attack surface explodes. Accent shifts, tonal inflections, background noise, reverb, adversarial audio—each creates new blind spots. And attackers are already exploiting them.
Malicious queries can now be cloaked in acoustic trickery, bypassing transcribers and confusing downstream models. The human ear might not catch it. Your LLM won’t either—unless you build for it.
Showcasing Real-World Attacks and How to Stop Them
In the video below, we showcase a series of attacks on a multimodal LLM (Gemini) and demonstrate how Lakera Guard can effectively detect and neutralize each one.
Let’s walk through the four attacks featured:
Attack 1: Clean Audio Jailbreak
A user simply speaks a jailbreak query in clear, natural speech. Gemini processes the speech and responds inappropriately. This demonstrates that even without audio tricks, multimodal LLMs can be vulnerable.
Defense: A robust approach here is to transcribe the speech and apply a text-based guardrail. Lakera Guard does exactly this—transcribing the input and leveraging its text-native filtering mechanisms to block unsafe queries.
Attack 2: Transcriber Bypass via Reverberation
Here, the attacker adds heavy reverberation to their speech. Gemini still understands and executes the malicious query, but the transcriber fails—rendering transcription-based defenses ineffective.
Defense: Lakera Guard goes beyond transcription. It analyzes the raw audio stream, identifying patterns and features indicative of malicious intent even when transcription fails. In this case, Lakera Guard blocks the attack despite the failed transcription.
Attack 3: Dual-Audio Obfuscation
This sophisticated attack involves combining two speech signals: a highly accented voice that tricks Gemini into jailbreaking, and a benign voice designed to fool the transcriber into thinking it’s a harmless query.
Defense: Again, the transcriber fails due to not knowing to which audio stream to attend. On the other hand, Lakera Guard flags the audio composition as suspicious. It detects mismatches and artifacts that suggest deliberate obfuscation, blocking the input before any downstream damage can occur.
Attack 4: Transcriber Muting (based on Muting Whisper by Raina et al.)
This attack uses a specially engineered waveform that tricks the transcriber into thinking the audio has ended. When this waveform is prepended to any malicious query, the transcriber stays silent, effectively “muting” the threat. This attack was designed by Raina et. al. in their paper “Muting Whisper.”
Defense: Again, Lakera Guard’s audio-native analysis catches the problem. It detects the unnatural waveform signature and stops the attack.
<div class="table_component" role="region" tabindex="0"> <table> <caption><br></caption> <thead> <tr> <th><p><b>Attack</b></p></th> <th><p><b>Why It Works (for Attacker)</b></p></th> <th><p><b>Lakera Guard Defense</b></p></th> </tr> </thead> <tbody> <tr> <td>Clean Audio Jailbreak</td> <td>A user speaks a jailbreak query in clear, natural speech. The model responds inappropriately.<br>No tricks—just clean speech that bypasses naive filters.</td> <td>Transcribes input and applies proven text-based guardrails to block unsafe queries.</td> </tr> <tr> <td>Transcriber Bypass via Reverberation</td> <td>The attacker adds heavy reverberation. The model still responds, but transcription fails.<br>Transcriber fails, allowing the malicious prompt to slip through.</td> <td>Analyzes raw audio directly to detect threats, even when transcription fails.</td> </tr> <tr> <td>Dual-Audio Obfuscation</td> <td>Two overlapping voices—one malicious, one benign—confuse both model and transcriber.<br>Transcriber is misled by the benign voice; model hears the malicious one.</td> <td>Flags audio artifacts and mismatches and detects signal manipulation.</td> </tr> <tr> <td>Transcriber Muting</td> <td>A specially engineered waveform makes the transcriber go silent, hiding the malicious input.<br>Transcriber thinks the audio ended—misses the malicious prompt.</td> <td>Detects engineered waveform signatures and blocks threats.</td> </tr> </tbody> </table> </div>
Why Traditional Defenses Fall Short
Transcribing audio and applying text filters is a solid baseline, but it’s no longer enough. As we’ve shown, attackers can:
Bypass transcription entirely
Manipulate speech to cloak malicious intent
Relying on transcription alone is like trying to judge a package by its label without ever opening it.
What’s needed is an audio-native defense—one that hears beyond words, detects adversarial patterns, and acts independently of transcription quality.
Enter Lakera Guard
Lakera Guard was purpose-built to secure multimodal systems, especially those with voice interfaces, against modern adversarial inputs.
Whether attackers distort signals, mask intent, or dodge transcription, Lakera Guard listens beneath the surface. It detects. It neutralizes. In real time.
Conclusion
Multimodal LLMs unlock powerful new capabilities, but also create new risks. The attack surface doesn’t just grow, it evolves.
Securing these systems takes more than legacy filters. That’s why Lakera Guard goes beyond transcription—with audio-native defenses built in. It’s real-time, resilient, and engineered to counter the shapeshifting tactics of modern adversaries.
What counts as PII in the age of GenAI—and why it’s getting harder to protect. This guide breaks down evolving risks and what modern defenses look like.
Explore the complex world of Adversarial Machine Learning where AI's potential is matched by the cunning of hackers. Dive into the intricacies of AI systems' security, understand adversarial tactics evolution, and the fine line between technological advancement and vulnerability.