Cookie Consent

Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.

The Expanding Attack Surface of Multimodal LLMs and How to Secure It

How attackers are hijacking voice interfaces—and why text-based filters won’t protect your LLM. Discover 4 real-world attacks and how Lakera Guard defends against each, in real time.

Pablo Mainar

June 18, 2025

Last updated:

June 19, 2025

Multimodal large language models (LLMs) are reshaping AI interaction. No longer limited to text, today’s models can process audio, images, and video—powering voice assistants, customer support agents, and next-gen multimodal copilots. The result? Breakthrough UX and entirely new product possibilities.

But there's a catch: security.

On this page

Hide table of contents

Show table of contents

From What You Say to How You Say It

Traditional text-based LLMs have a single attack vector: what the user types. Defenses like input validation, context filters, and prompt shields have matured around that.

Enter voice.

When models need to understand not just what is said, but how, the attack surface explodes. Accent shifts, tonal inflections, background noise, reverb, adversarial audio—each creates new blind spots. And attackers are already exploiting them.

Malicious queries can now be cloaked in acoustic trickery, bypassing transcribers and confusing downstream models. The human ear might not catch it. Your LLM won’t either—unless you build for it.

Showcasing Real-World Attacks and How to Stop Them

In the video below, we showcase a series of attacks on a multimodal LLM (Gemini) and demonstrate how Lakera Guard can effectively detect and neutralize each one.

Let’s walk through the four attacks featured:

Attack 1: Clean Audio Jailbreak

A user simply speaks a jailbreak query in clear, natural speech. Gemini processes the speech and responds inappropriately. This demonstrates that even without audio tricks, multimodal LLMs can be vulnerable.

Defense: A robust approach here is to transcribe the speech and apply a text-based guardrail. Lakera Guard does exactly this—transcribing the input and leveraging its text-native filtering mechanisms to block unsafe queries.

Attack 2: Transcriber Bypass via Reverberation

Here, the attacker adds heavy reverberation to their speech. Gemini still understands and executes the malicious query, but the transcriber fails—rendering transcription-based defenses ineffective.

Defense: Lakera Guard goes beyond transcription. It analyzes the raw audio stream, identifying patterns and features indicative of malicious intent even when transcription fails. In this case, Lakera Guard blocks the attack despite the failed transcription.

Attack 3: Dual-Audio Obfuscation

This sophisticated attack involves combining two speech signals: a highly accented voice that tricks Gemini into jailbreaking, and a benign voice designed to fool the transcriber into thinking it’s a harmless query.

Defense: Again, the transcriber fails due to not knowing to which audio stream to attend. On the other hand, Lakera Guard flags the audio composition as suspicious. It detects mismatches and artifacts that suggest deliberate obfuscation, blocking the input before any downstream damage can occur.

Attack 4: Transcriber Muting (based on Muting Whisper by Raina et al.)

This attack uses a specially engineered waveform that tricks the transcriber into thinking the audio has ended. When this waveform is prepended to any malicious query, the transcriber stays silent, effectively “muting” the threat. This attack was designed by Raina et. al. in their paper “Muting Whisper.”

Defense: Again, Lakera Guard’s audio-native analysis catches the problem. It detects the unnatural waveform signature and stops the attack.

<div class="table_component" role="region" tabindex="0">
<table>
<caption><br></caption>
<thead>
<tr>
<th><p><b>Attack</b></p></th>
<th><p><b>Why It Works (for Attacker)</b></p></th>
<th><p><b>Lakera Guard Defense</b></p></th>
</tr>
</thead>
<tbody>
<tr>
<td>Clean Audio Jailbreak</td>
<td>A user speaks a jailbreak query in clear, natural speech. The model responds inappropriately.<br>No tricks—just clean speech that bypasses naive filters.</td>
<td>Transcribes input and applies proven text-based guardrails to block unsafe queries.</td>
</tr>
<tr>
<td>Transcriber Bypass via Reverberation</td>
<td>The attacker adds heavy reverberation. The model still responds, but transcription fails.<br>Transcriber fails, allowing the malicious prompt to slip through.</td>
<td>Analyzes raw audio directly to detect threats, even when transcription fails.</td>
</tr>
<tr>
<td>Dual-Audio Obfuscation</td>
<td>Two overlapping voices—one malicious, one benign—confuse both model and transcriber.<br>Transcriber is misled by the benign voice; model hears the malicious one.</td>
<td>Flags audio artifacts and mismatches and detects signal manipulation.</td>
</tr>
<tr>
<td>Transcriber Muting</td>
<td>A specially engineered waveform makes the transcriber go silent, hiding the malicious input.<br>Transcriber thinks the audio ended—misses the malicious prompt.</td>
<td>Detects engineered waveform signatures and blocks threats.</td>
</tr>
</tbody>
</table>
</div>

Why Traditional Defenses Fall Short

Transcribing audio and applying text filters is a solid baseline, but it’s no longer enough. As we’ve shown, attackers can:

Bypass transcription entirely

Manipulate speech to cloak malicious intent

Relying on transcription alone is like trying to judge a package by its label without ever opening it.

What’s needed is an audio-native defense—one that hears beyond words, detects adversarial patterns, and acts independently of transcription quality.

Enter Lakera Guard

Lakera Guard was purpose-built to secure multimodal systems, especially those with voice interfaces, against modern adversarial inputs.

Whether attackers distort signals, mask intent, or dodge transcription, Lakera Guard listens beneath the surface. It detects. It neutralizes. In real time.

Conclusion

Multimodal LLMs unlock powerful new capabilities, but also create new risks. The attack surface doesn’t just grow, it evolves.

Securing these systems takes more than legacy filters. That’s why Lakera Guard goes beyond transcription—with audio-native defenses built in. It’s real-time, resilient, and engineered to counter the shapeshifting tactics of modern adversaries.

Want to go deeper or see Lakera Guard in action?

👉 Reach out—we’d love to show you.

Pablo Mainar

Senior ML Engineer at Lakera

GenAI Security Preparedness
Report 2024

Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.

Free Download

What Is Personally Identifiable Information (PII)? And Why It’s Getting Harder to Protect

What counts as PII in the age of GenAI—and why it’s getting harder to protect. This guide breaks down evolving risks and what modern defenses look like.

Lakera Team

May 31, 2025

min read

•

AI Security

Outsmarting the Smart: Intro to Adversarial Machine Learning

Explore the complex world of Adversarial Machine Learning where AI's potential is matched by the cunning of hackers. Dive into the intricacies of AI systems' security, understand adversarial tactics evolution, and the fine line between technological advancement and vulnerability.

Brain John Aboze

April 24, 2025

Activate
untouchable mode.

Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Book a demo Start for free

Join our Slack Community.

Several people are typing about AI/ML security.  Come join us and 1000+ others in a chat that’s thoroughly SFW.

Join Lakera Momentum Slack

The Expanding Attack Surface of Multimodal LLMs and How to Secure It

From What You Say to How You Say It