Risk Evolution

From Model Manipulation to Autonomous Breach

The Agentic Top 10 does not replace the LLM Top 10. It extends it. The vulnerability remains, but the blast radius expands once autonomy is introduced.

LLM Risk Agentic Evolution
Prompt Injection Goal Hijack / Memory Poisoning
Excessive Agency Tool Misuse / Identity & Privilege Abuse
Improper Output Handling Unexpected Code Execution
Data & Model Poisoning Persistent Memory Corruption
Misinformation Human–Agent Trust Exploitation

The same manipulation techniques still apply. The difference is that the agent can now act on them.

Phase 1: Compromise the Mind

Every agentic breach begins the same way. The attacker does not start by breaking authentication or exploiting an API. They start by changing what the agent believes.

In traditional LLM systems, this shows up as prompt injection. The model is tricked into producing an unsafe output. That is serious, but it is usually bound to a single interaction.

In agentic systems, the same technique becomes far more consequential.

Agents ingest more than user prompts. They read documents, retrieve RAG content, parse tool outputs, process emails, and consume messages from other agents. All of that flows into the same natural-language layer that shapes intent and planning. There is no reliable separation between “data” and “instruction” at that layer.

A sentence buried in a PDF can influence the same reasoning process as a system prompt.

This is where OWASP places Agent Goal Hijack. The attacker does not need to override the agent’s core instructions. They only need to redirect how the agent interprets its objective. A poisoned document can subtly reweight priorities. A malicious calendar invite can alter task selection. A tool response can smuggle in new constraints. The agent still appears aligned. Its internal objective has shifted.

That shift becomes more durable when memory is involved.

Agents summarize conversations. They store embeddings. They retrieve prior outputs as context for future decisions. When attacker-controlled content enters that memory layer, it does not disappear after one run. It persists. It gets re-summarized. It influences new plans. OWASP calls this Memory and Context Poisoning, and it frequently becomes the mechanism that locks a hijacked goal in place.

At this stage, nothing looks compromised. The agent still follows its system prompt. It still uses approved tools. Logs show normal API calls. The only thing that changed is the agent’s internal model of what it is supposed to optimize for.

That is the first and most important transition in the Agentic Top 10.

Manipulate the language layer in an LLM, and you influence a response. Manipulate the belief layer in an agent, and you influence behavior.

The mind has been compromised. The system just does not know it yet.

**In a recent internal Lakera hackathon, we stress-tested real attack scenarios inside an OpenClaw-style agent ecosystem. The goal: explore how agentic threats actually manifest, beyond theory.

These three deep dives document what we found:

Phase 2: Convert Autonomy into Power

Compromising intent is only the first step. The real risk appears when that compromised intent is given execution rights.

This is the structural difference between LLM systems and agentic systems. An LLM generates text. An agent generates actions.

Once autonomy is in play, the manipulated belief from Phase One begins to drive real operations. Agents call APIs. They trigger workflows. They execute scripts. They move money. They provision infrastructure. They modify configurations. They do this under valid credentials and within approved integrations.

This is where two OWASP categories become central: Tool Misuse and Exploitation and Identity and Privilege Abuse.

In the LLM Top 10, “Excessive Agency” describes what happens when a model is allowed to act too freely. In agentic systems, that freedom becomes leverage. A hijacked goal now has access to systems that matter.

-db1-The shift is subtle but critical:

  • A poisoned instruction is no longer just unsafe text. It becomes an API call.
  • A skewed memory entry is no longer just bias. It becomes a workflow decision.
  • A hallucinated output is no longer just misinformation. It can become executed code.

In practice, this looks disturbingly normal:

  • A finance agent prepares a transfer through an approved payments API. The credentials are valid. The endpoint is correct. The destination account was “validated” earlier in memory.
  • A coding agent pulls a dependency and runs a build step as part of routine maintenance. The package resolves successfully. The backdoor executes inside the build pipeline.
  • A security automation agent aggregates logs across systems. The tool chain is legitimate. The destination endpoint was quietly influenced upstream.

-db1-

Nothing in these flows breaks authentication. Nothing bypasses a firewall. The agent is operating exactly as designed. The only variable that changed is the objective driving those actions.

OWASP classifies this as Identity and Privilege Abuse because the agent often operates as a delegated principal. It inherits access from users, service accounts, or other agents. When the intent layer is compromised, that inherited privilege becomes an amplification mechanism.

At this point, the breach has crossed a threshold. It is no longer about language manipulation. It is about operational authority.

The system has moved from compromised reasoning to compromised execution.

And once execution is automated, containment becomes harder with every additional workflow that trusts the output.

Phase 3: Allow the System to Propagate

Up to this point, the compromise can still be misunderstood as a single-agent failure. A poisoned memory. A misused tool. A delegated credential gone wrong.

Agentic systems rarely operate in isolation.

Modern deployments rely on networks of planners, executors, retrievers, reviewers, and domain-specific helpers. Agents pass tasks to one another. They exchange context. They register capabilities through discovery services. They trust responses from peers inside the system boundary.

That architecture is what enables scale. It is also what enables spread.

OWASP captures this under Insecure Inter-Agent Communication and Agentic Supply Chain Vulnerabilities.

When agents communicate, they often treat internal messages as trustworthy by default. A planning agent issues instructions. An execution agent carries them out. A helper agent advertises a capability and gets routed traffic. If identity, intent, and message integrity are not strongly bound together, a compromised agent can influence others without ever breaching them directly.

-db1-Propagation can take several forms:

  • A low-privilege agent relays a request that inherits higher privileges downstream.
  • A malicious tool descriptor or MCP endpoint advertises capabilities that cause multiple agents to route data through it.
  • A compromised update in a shared registry spreads across agents that dynamically load tools at runtime.
  • A poisoned memory entry becomes shared context across multiple workflows.

The original manipulation now has distribution.-db1-

In traditional systems, compromise often requires lateral movement through explicit exploitation. In agentic systems, lateral movement can happen through normal coordination. Agents are designed to pass work along. They are designed to trust structured outputs from peers. They are designed to reuse context.

That design goal becomes the propagation channel.

This is also where supply chain risk changes shape. In static software, a compromised dependency spreads when deployed. In agentic ecosystems, tools, prompts, and capabilities can be loaded dynamically at runtime. A poisoned component does not need a full redeploy to spread. It can be discovered and trusted on demand.

The breach is no longer contained to a single objective. It now influences multiple agents, multiple workflows, and potentially multiple domains.

The system is teaching itself the attacker’s assumptions.

And once that happens, containment becomes exponentially harder.

Framework Overview

OWASP Top 10 for Agentic Applications (2026)

A concise summary of the highest-impact threats facing autonomous AI systems, tools, and multi-agent coordination.

Threat Description Quick Example
ASI01: Goal Hijack Manipulation of instructions to redirect an agent's objectives or decision pathways. Malicious emails triggering silent data exfiltration via Copilot.
ASI02: Tool Misuse Agents applying legitimate tools in unauthorized, unsafe, or unintended ways. An email summarizer being tricked into deleting production records.
ASI03: Identity Abuse Exploitation of dynamic trust and delegation to escalate access and bypass controls. Low-privilege agents inheriting excessive rights from high-privilege managers.
ASI04: Supply Chain Malicious artifacts, third-party agents, or tools dynamically loaded into the execution chain. A backdoored MCP server secretly BCC'ing organization emails to attackers.
ASI05: Unexpected RCE Conversion of natural language into adversarial code execution or container escapes. "Vibe coding" agents executing unreviewed shell commands that wipe data.
ASI06: Memory Poisoning Corruption of persistent context or retrievable knowledge to bias future reasoning. Attacking RAG sources to implant false refund policies in a finance agent.
ASI07: Insecure Inter-Agent Weak controls on exchanges between coordinating agents allowing message manipulation. MITM attacks on unauthenticated message buses hijacking task coordination.
ASI08: Cascading Failures Propagation of a single fault across autonomous agents causing system-wide harm. Financial trading agents network-wide acting on a single poisoned risk limit.
ASI09: Trust Exploitation Manipulation of humans through fluency, perceived authority, or "fake explainability." An agent fabricating audit rationales to trick analysts into deleting a database.
ASI10: Rogue Agents Malicious or compromised agents that deviate from their scope to sabotage operations. A hijacked agent autonomously spawning replicas to consume cloud resources.

Phase 4: Lose Containment

Once compromised intent has been operationalized and allowed to propagate, the system enters its most dangerous phase.

Loss of containment does not happen in a single moment. It emerges when small, automated decisions begin to reinforce each other.

OWASP calls this Cascading Failures.

Cascading failures are not the initial vulnerability. They are what happens when compromised agents continue to plan, execute, delegate, and learn without interruption. One altered decision becomes many. One automated action triggers a chain of dependent workflows. One poisoned assumption spreads across domains.

At this stage, the system is no longer responding to an attacker. It is responding to its own corrupted state.

-db1-The warning signs are operational, not linguistic:

  • Identical actions fanning out across multiple services in seconds.
  • Agents repeating each other’s outputs as trusted input.
  • Privileged workflows executing at scale under valid credentials.
  • Cross-domain effects where a decision in one subsystem reshapes behavior in another.

-db1-

Each action appears justified in isolation. Together, they form a feedback loop.

A planning agent adjusts parameters based on skewed data. Execution agents follow the updated plan. Oversight agents see policy compliance and allow it through. Memory persists the outcome. Future plans treat the corrupted result as ground truth.

Nothing in this sequence requires malware. Nothing requires broken authentication. The system remains internally consistent. It is simply optimizing around the wrong objective.

This is the real insight of the Agentic Top 10.

The risks are not independent categories. They describe a progression. Compromise intent. Convert autonomy into power. Enable propagation. Lose containment.

By the time cascading failures appear, the original prompt injection or poisoned document is often irrelevant. The system is now driving its own amplification.

That is the shift from model manipulation to autonomous breach.

** How far can agentic workflows be pushed? In this research, we demonstrate a zero-click remote code execution chain that turns normal MCP integrations and AI coding assistants into a scalable attack vector, no user interaction required.

Zero-Click Remote Code Execution: Exploiting MCP & Agentic IDEs A silent Google Doc share triggers prompt injection, automatic payload retrieval, credential theft, and persistent reverse shell access, all through intended agentic IDE functionality.**

Why This Model Matters

The OWASP Agentic Top 10 is not simply a taxonomy of risks. It is a model for how autonomy changes the shape of failure.

In traditional LLM systems, manipulation often ends at output. A model produces something unsafe. A guardrail blocks it. A human reviews it. The blast radius is limited.

In agentic systems, the same manipulation becomes the first stage of execution. A poisoned instruction becomes a goal. A skewed memory entry becomes a planning input. A misaligned output becomes a workflow trigger. Each step builds on the previous one.

That progression is the insight.

The Agentic Top 10 matters because it forces teams to think in phases rather than categories. Not “Do we block prompt injection?” but “If intent is compromised, how far can it travel?” Not “Are our tools secured?” but “What happens when a compromised objective uses them?” Not “Do agents authenticate to each other?” but “How quickly can a bad decision replicate across the system?”

Seen through that lens, the Top 10 stops being a checklist and becomes a containment framework.

  1. Compromise intent.
  2. Convert autonomy into power.
  3. Enable propagation.
  4. Lose containment.

Once autonomy is introduced, security is no longer about filtering inputs. It is about limiting amplification.

That is the real shift the Agentic Top 10 is trying to capture.