Memory Poisoning & Instruction Drift: From Discord Chat to Reverse Shell (OpenClaw Hackathon Findings)

**OpenClaw and its surrounding ecosystem have recently become a focal point in AI security discussions. What began as an open, extensible agent platform has quickly exposed how autonomy, tool execution, and collaborative workflows can intersect in ways that introduce real operational risk.

If you’re looking for a broader overview of the OpenClaw ecosystem, its skills model, and why agentic systems are raising concerns for CISOs, see Steve Giguere’s in-depth analysis of the situation.

The article below explores one of two technical tracks from our internal research hackathon, focused specifically on how persistent memory and instruction drift affect agent behavior.**

At a recent internal OpenClaw hackathon, we explored the security boundaries of agentic systems through two technical tracks.

The first examined the agent core itself: how persistent memory and tool execution interact over time. The second focused on the emerging skill marketplace and analyzed it at scale.

This article documents the first track.

Under controlled lab conditions, we investigated how an AI agent with long-lived memory and shell execution capabilities behaves across repeated interactions. What emerged was not a simple prompt injection scenario, but a gradual shift in internal state that ultimately led to reverse shell execution on a test machine, triggered through Discord messages alone.

If you are interested in marketplace-level risk—including confirmed malware delivery through OpenClaw skills—see our companion analysis:

👉 The Agent Skill Ecosystem: When AI Extensions Become a Malware Delivery Channel

What follows is a focused examination of memory poisoning and instruction drift inside the agent itself.

-db1-

TL;DR

In a controlled lab setup, we conditioned an AI agent with persistent memory to execute a malicious binary via Discord messages alone.
No prompt injection bypass. No API misconfiguration. No privilege escalation.
Over multiple interactions, durable memory entries shifted the agent’s internal trust hierarchy.
Once trust assumptions changed, a “system update” request triggered reverse shell execution.
Persistent memory becomes policy. If long-lived state is mutable, execution behavior can drift.
Agents with tool execution must run in restricted, least-privilege environments by default.-db1-

Experimental Setup

The experiment was conducted under controlled conditions.

All testing took place on clean lab machines. No production systems were involved, and no corporate data was connected. Beyond the required Discord bot token, no external service credentials were configured.

OpenClaw was intentionally deployed in a minimal configuration. The model used was GPT-4.1-o-mini. No additional skills were enabled, and no external service tokens were present. The only active integration was Discord.

The Discord integration required a bot token to function. No additional credentials were present in the environment.

The agent process itself was running with administrative privileges on the test machine, meaning it had full access to the local system. This mirrors how many users run local agent setups by default.

Execution was ultimately triggered by a third-party user in the Discord channel who did not have administrative privileges.

This was not an unsecured enterprise deployment. It was a constrained lab environment designed to observe agent behavior under common local configurations.

Memory Architecture

OpenClaw maintains persistent state in a local workspace directory:

-bc-.openclaw/workspace/-bc-

Within this directory, a file named MEMORY.md stores long-lived context that persists across sessions.

OpenClaw also maintains a memory/ subdirectory for more granular, time-scoped entries, e.g. daily memory files such as:

-bc-.openclaw/workspace/memory/20260210.md-bc-

-bc-.openclaw/workspace/memory/20260211.md-bc-

These local files capture shorter-term context. However, our experiment specifically targeted the global MEMORY.md file, which influences durable behavior across sessions.

MEMORY.md shapes how the agent interprets:

Trust relationships
Behavioral preferences
Task continuity
Instruction priority

Unlike transient conversation context, entries in MEMORY.md become part of the agent’s durable state. They survive restarts and influence subsequent decision-making.

During the experiment, the team observed that this persistent memory was not immutable. It could be influenced indirectly through interaction patterns.

Over multiple staged interactions, memory entries began to reflect reinforced trust assumptions about a specific Discord user.

This was not a single-prompt exploit. It required gradual conditioning.

Instruction Drift

The key mechanism was instruction drift.

Through repeated interactions, the agent’s internal prioritization shifted. A non-admin Discord user was gradually treated as a trusted authority.

Several important clarifications:

The triggering user had no elevated Discord privileges.
The agent’s base configuration did not grant special execution authority.
Direct execution attempts without prior conditioning did not succeed.

Earlier attempts to trigger execution without memory reinforcement failed. The model resisted direct malicious instruction when no trust context had been established.

Only after memory entries had accumulated reinforcing patterns did execution become viable.

The agent began elevating the conditioned user’s requests above its original instruction hierarchy.

This was not as much a prompt injection as a state evolution.

Reverse Shell Execution

Once the agent’s trust assumptions had shifted, execution followed a predictable path.

The binary was presented as a system update consistent with prior interactions. Earlier memory conditioning had established the requesting user as trusted. The update request was structured to align with previously approved behaviors.

Because tool execution was already available in the environment, the agent had the capability to run shell commands. The architecture permitted it.

The request triggered execution of a local binary. In testing, this included a reverse shell payload, resulting in remote access to the machine.

No privilege escalation was required. The agent executed within the permissions already granted to its process.

Failed Execution Attempt (No Memory Conditioning)

Reverse Shell Execution After Memory Conditioning

In the demonstration video, the executable launches Chrome and opens a Lakera YouTube video. For illustration purposes, the browser launch serves as a visible indicator of successful binary execution. Because the executable is non-human-readable, the model cannot determine its contents without additional tooling and treated it as a legitimate system update, as reflected in its response. The same execution flow was verified internally using a reverse shell payload under controlled lab conditions.

What This Demonstrates

This experiment highlights several architectural realities.

Persistent memory influences execution behavior. Long-lived state can shape how future instructions are interpreted.
Instruction hierarchy is mutable. Authority relationships are not necessarily static.
Tool execution inherits trust assumptions. When trust boundaries shift, execution pathways follow.
Memory integrity becomes a security boundary. If persistent state can be influenced without validation, execution decisions can drift over time.

These findings reinforce the need to run agent systems in restricted environments with explicit execution controls. Local agents should not operate with broad privileges by default.

Practical Implications for Agent Architectures

Agent systems that combine memory and execution capabilities must treat persistent state as security-critical.

-db1-Key architectural considerations include:

Memory validation mechanisms that prevent untrusted authority escalation
Immutable system-level instructions separated from user-modifiable memory
Clear boundaries between user input and policy-level state
Execution sandboxing to limit blast radius
Least-privilege runtime environments for tool access
Explicit trust models rather than emergent authority assumptions-db1-

As agents become more autonomous and persistent, the attack surface shifts. Security considerations extend beyond prompt injection to include long-lived state evolution.

The Lakera team has accelerated Dropbox’s GenAI journey.

Not sure how to secure your GenAI application?
Skip the guesswork with expert-recommended policies built by Lakera’s AI security team. Apply them in seconds, fine-tune when you’re ready, and get started with real protection from day one.

Download the Guide

On this page

Text Link

Hide table of contents

Show table of contents