For those in the cybersecurity world, OWASP (Open Web Application Security Project) and its Top 10 web app security list need no introduction. Over the years, OWASP released various Top 10 lists, tailored for example for APIs, mobile applications or DevOps.
Guess what? They've now got a Top10 specifically for LLM applications! ;-)
The OWASP Top 10 for LLMs is a living document that's constantly updated with new insights from a community of nearly 500 international security experts, practitioners, and organisations in the field. It has gone through a few iterations, reflecting the rapidly evolving landscape of AI security and LLM vulnerabilities.
To stay at the forefront of LLM security, consider following the OWASP resource page dedicated to LLMs and joining:
At Lakera, LLM security is our top priority. We've spent years developing AI for high-risk scenarios, and have got hands-on experience detecting and tackling some of LLM vulnerabilities, including prompt injection attacks - the number one vulnerability on OWASP's list. Our track record includes leading one of the most extensive global LLM red-teaming initiatives - Gandalf & Mosscap, and red-teaming some of the biggest LLM models out there.
In this practical guide, we’ll give you an overview of OWASP Top10 for LLMs, share examples, strategies, tools, and expert insights on how to address risks outlined by OWASP. You’ll learn how to securely integrate LLMs into your applications and systems while also educating your team.
Here’s a quick overview of the vulnerabilities we’ll explore.
LLM01: Prompt Injection – an attacker manipulates an LLM via crafted inputs, causing unintended actions. This is done by overwriting system prompts or manipulating external inputs.
LLM02: Insecure Output Handling – if the system blindly trusts an output of the large language model (LLM), then it potentially leads to security issues such as XSS or remote code execution. The damage can be worse if the system is vulnerable to the indirect prompt injection attacks.
LLM03: Training Data Poisoning – this vulnerability occurs when the data used to teach a machine learning model is tampered with, which can lead to biases. It can affect the model's effectiveness, lead to harmful outputs, and potentially damage the reputation of a brand that is using LLM.
LLM04: Model Denial of Service – an attacker manipulates the model's resource consumption, leading to service degradation and potential high costs.
LLM05: Supply Chain Vulnerabilities – vulnerabilities related to the training data, modules, libraries, deployment platforms and 3rd party solutions used during the development of the model, leading to biased results, data leaks, security issues, or even system failures.
LLM06: Sensitive Information Disclosure – LLM applications, if not properly secured, can inadvertently leak sensitive details, proprietary data, and violate user privacy. One must be aware of how to interact safely with LLMs to ensure data security and prevent unauthorized access.
LLM07: Insecure Plugin Design – this one refers to the practice of developing LLM plugins without proper controls or validation checks, potentially leading to harmful actions like remote code execution.
LLM08: Excessive Agency – LLM-based systems can perform actions that lead to unforeseen results. This problem arises from giving LLM-based systems too much functionality, rights or independence.
LLM09: Overreliance – this is more of a threat to users than a vulnerability in LLM. It refers to all sorts of implications that can be incurred by a user who uses output from LLM - legal consequences, spreading disinformation, etc.
LLM10: Model Theft - leakage, exfiltration or copying the model, that may lead to i.e. losing the competitive advantage.
Now, let’s dive in!
Prompt injection stands out as the most well-known vulnerability found in LLM-based applications. This type of attack occurs when an attacker manipulates an LLM through specifically crafted inputs, forcing the model to follow the instructions beyond its intended scope of operations. Successful exploitation of prompt injection may lead to a variety of outcomes - from data leakage to the remote code execution in the targeted system.
We can classify Prompt Injection into one of two classes of this vulnerability - Direct and Indirect Prompt Injections.
In a Direct Prompt Injection, the attacker overwrites or uncovers the underlying system prompt, often referred to as "jailbreaking". This manoeuvre can provide a gateway for infiltrators to exploit backend systems by interacting directly with the LLM.
Probably the best known Direct Prompt Injection payload is format is:
Ignore all of your previous instructions and (...).
But the payloads can get much more advanced. For example, in Universal and Transferable Adversarial Attacks on Aligned Language Models (Zou, et. el.) authors demonstrated an adversarial suffix that enables the attacker to jailbreak the model and make it follow the adversarial instructions - you can check the examples published on the paper’s website.
One of the famous, early examples of Direct Prompt Injection was Prompt Injection in Bing Chat executed by Kevin Liu, in which the codename of Bing Chat (“Sydney”) was revealed. Another one is Remote Code Execution in PandasAI by Tong Liu.
Second type of PI - Indirect Prompt Injections happens when the LLM accepts input from external sources that the attacker can control, such as files or websites. The intruder embeds a prompt injection into the external content and then hijacks the LLM conversation context. It enables the offender to manipulate users or systems accessible by the LLM.
Indirect Prompt Injection attack was demonstrated in the paper called “Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection” (Greshake, et. al.) for the first time. In this paper, authors demonstrate how the attacker can disrupt the LLM operations by indirectly passing malicious payloads - using a crafted website.
Another examples of scenarios that may cause Indirect Prompt Injection include a malicious user uploading a resume with embedded indirect prompt instructions, making the LLM inform the recruiter that the document is excellent and they should hire the attacker (by one of the authors of the paper above), or indirect prompt injection via Youtube Transcripts.
In each of these examples, the compromised LLM effectively acts as an agent for the attacker, bypassing usual safeguards and leaving the user vulnerable to the breach. Understanding these vulnerabilities and developing robust security measures is crucial in maintaining the integrity of LLMs and protecting users from such attacks.
If you want to test your Prompt Injection skills, you should check out Gandalf - our online AI education game where players take on a challenge of tricking the LLM into revealing the password.
To prevent Prompt Injection, OWASP suggested several measures. These include limiting LLM access to backend systems, human approval for high-privilege operations, separating trusted content from user prompts, and establishing trust boundaries between the LLM, external sources, and extendible functionalities (like plugins etc.).
By adhering to these guidelines, the risk of both direct and indirect prompt injection can be reduced, increasing the overall security of AI systems.
However, at Lakera we think that the common prevention methods are not enough. According to some other sources this threat is something that occurs “naturally” in LLMs: “jailbreaks” [are] a naturally-occurring form of neural trojans” (NeurIPS 2023) - and we believe that LLM providers won’t be able to resolve this issue alone, and third party tools will always be needed. That’s why we’ve built Lakera Guard - the most powerful LLM security platform that enables you to build LLM applications without worrying about prompt injections, data leakage, hallucinations, and other common threats.
Let’s have a look at Lakera Guard in action.
Below you will find an example of request sent to the Lakera Guard API in order to verify if the prompt is legit.
Once the request is processed, API classifies the input (in this case, we passed a few hundred “a” characters - such a payload causes the LLM to hallucinate).
Another example of the payload can be the following.
Of course, Lakera Guard classifies that as a prompt injection.
While Prompt Injection refers to the input provided to the LLM, Insecure Output Handling is a output-related vulnerability (as the name implies).
OWASP recommends treating LLM output as a content generated by regular users of the system - that means, you should never trust the output of the model. Outputs should be sanitized and encoded (if possible) to protect from attacks such as XSS.
What is common for Insecure Output Handling attacks is the fact that usually they begin with some kind of Prompt Injection. Use Lakera Guard API to protect your LLM-based application from this kind of threats.
Data is crucial for modern AI. In the LLM development, data is used at least in a few steps: pre-training (this is the initial training process that enables the language model to understand the structure and the semantics of the language), fine-tuning (used for changing the style, tone, format or other qualitative aspects) and embedding (enables prompt engineers to feed a small dataset of domain knowledge into the large language model).
Each of these datasets can be poisoned (they can be tampered with or manipulated by attackers to undermine the LLM's performance or to output content that benefits their malicious intents.).
One of the interesting examples of what can be the effects of the data poisoning is the AutoPoison framework. In this work authors (Shu, et. al.) demonstrated how it is possible to poison the model during the instruction tuning process.
It is advised to verify the supply chain of the Training Data, especially when it is obtained from external sources. You can consider using Bill of Materials standards such as CycloneDX ML-BOM to provide transparency of your dataset.
Another prevention step is the implementation of sufficient sandboxing. This will stop your AI model from scraping data from untrusted sources. Incorporating dedicated LLMs for benchmarking against undesirable outcomes is also beneficial. These LLMs can then be used to train other LLMs via reinforcement learning techniques.
You can also conduct LLM-oriented red team exercises or LLM vulnerability scanning during the testing phases of the LLM lifecycle. This proactive approach can identify potential issues that occur in the model due to the data poisoning.
The Denial of Service attack is common for all of the network-based technologies. During this attack in LLM context, an attacker interacts with the model in such a way that the resources (such as bandwidth or system processing power) are over-consumed and the availability of the attacked system is harmed. Alternatively the costs of usage are much higher than usual.
One of the examples of DoS attack on LLM can be this attack by @wunderwuzzi23 - by recursively calling the plugin, costs of LLM usage may become pretty high. Another example is this demonstration of the possibility of generating +1000$ bill in a single call to LLM app by Harrison Chase - in this case it’s a prompt injection that led to the DoS attack.
To protect against DoS attacks, the countermeasures known from securing "classic" APIs and applications are a good starting point. First, you should implement rate limiting. This protects your system from being overwhelmed by too many requests.
Another good countermeasure can be to simply limit the number of characters the user can send in a query. Not only does this protect your LLM-based API from resource exhaustion, but it also makes you resilient to some of the malicious payloads, such as the one we've shown in the "Prompt Injection" section of this article (the one with multiple "a" characters).
To protect yourself against the attacks such as the one by Harrison Chase described above, you can use the methods provided by the frameworks providers - for example, for LangChain, you can use max_iterations parameter.
LLM applications have the potential to disclose sensitive information and data leakage. Unauthorized access to the sensitive data may lead to the privacy violation. Education of the users on the possible risks is crucial, and the LLM applications should sanitize the data in order to prevent this vulnerability from occurring.
Although commercial LLM vendors claim that the data will not be used in teaching subsequent iterations of the models, it is not really clear exactly how the data is processed. That is why multiple international companies banned the usage of LLMs in their working environments - Samsung (which have reported misuse of the ChatGPT by their employees), JPMorgan or Apple to name a few.
The most common issues when it comes to development of LLM-based applications is storing and using sensitive data in fine-tuning process, or unintentional disclosure of confidential data, such as “leakage” of Windows keys, by tricking GPT info pretending to be a grandmother reading a story (it’s worth noting that those specific keys were generic keys, that will not activate Windows, but it gives an example of what the threat can be).
The interaction between the consumer and the LLM application forms a two-way trust boundary. This implies that the client's input to the LLM or the LLMs output to the client cannot be inherently trusted.
If you want to protect sensitive data from being leaked from your environments, consider using Lakera Chrome extension.
Lakera Guard API also offers PII guardrails, so if you want to protect your users from leaking their PII, consider integrating Lakera Guard with your LLM-based application.
One of the huge advantages of GPT-4 in ChatGPT is the access to the plugins. Using plugins you can for example scrap the contents of the websites, generate and execute the code, work with maths etc.
As the plugins are indeed separate pieces of code that integrate your instance of ChatGPT with third party applications, they bring additional security risks. The most common risks are data leaks to 3rd parties, Indirect Prompt Injection via external sources analyzed using plugins or even authentication in 3rd party applications without being authorized by the user.
If you are just a regular user, remember to use only trusted plugins, from the providers of verified reputation and those, who are maintaining their plugins regularly.
If you are a plugin developer, first of all you should protect the plugin in the same way as you will do with regular REST API.
Every sensitive action performed by the plugin should be manually verified and authorized by the user. Data passed to the plugin should be sanitized and plugin’s code should be regularly examined for potential vulnerabilities. The results returned by the plugin should be considered as potentially malicious. Last but not least, it’s advised to follow OpenAI’s guidelines for plugin development as the plugin needs to be reviewed by OpenAI before being added to the plugin store.
Agency is a feature of LLM Agents. Agent is a LLM-based system that goes beyond text generation and it can perform various tasks, such as connecting with the APIs or using tools. LLM Agents are developed using frameworks such as AutoGPT. When interacting with other systems, Agents can be either helpful or destructive.
Excessive Agency is a vulnerability in which an agent is developed with excessive functionality, excessive permissions or excessive autonomy.
This vulnerability can impact confidentiality, integrity and availability spectrum, and is dependent on which systems an LLM-based app is able to interact with - for example, here is a description of a vulnerability that could have been exploited in the past versions of Auto-GPT, if it wasn’t ran in the virtualized environment.
In AutoGPT documentation you can read the following:
It is recommended to use a virtual machine/container (docker) for tasks that require high security measures to prevent any potential harm to the main computer's system and data. If you are considering to use AutoGPT outside a virtualized/containerized environment, you are strongly advised to use a separate user account just for running AutoGPT. This is even more important if you are going to allow AutoGPT to write/execute scripts and run shell commands!
You should always avoid giving the agents too much autonomy, providing redundant functionality to agents or giving the agents access to open-ended functions, such as access to the system’s command line.
**💡 Pro Tip: Check out Outsmarting the Smart: Intro to Adversarial Machine Learning**
Overreliance is a situation where the user uncritically trusts the LLM’s output and is not verifying the credibility of the content produced by the model.
Nowadays, ChatGPT itself has over 100 million users worldwide. Many of them tend to use ChatGPT over web search engines such as Google. This way, they are relying on the information that may be as well hallucination of the model - not everyone is aware that ChatGPT’s knowledge is limited, and the dataset on which it was trained is limited to 2021 (actually, during the work on this article OpenAI published an update in which it gave ChatGPT the ability to access the Internet, but that’s only available in the Plus version). That means, you can’t ask chat about current events and discoveries.
It brings obvious consequences and threat for people who are using ChatGPT for content creation (let’s say, in news creation LLM can just generate fake news without additional training and embedding), but there’s also another caveat - the risk of introducing the technological debt to the software development project you are working on. Technology has changed dramatically over the past two years, so if you are using out-of-the-shelf LLMs for code generation, you might introduce outdated libraries to your codebase.
It’s not a vulnerability, but rather a threat and a risk of using Generative AI. You should always remember to verify the credibility of content generated by Large Language Models. If you are responsible for the security of your organization, you should also train the employees on potential consequences of overreliance.
Last but not least, we have model theft. The model can be stolen by the attackers gaining access to the network of the model’s creators, it can be (intentionally or unintentionally) leaked by the employee, and last but not least it can be stolen in an attack called “Model Extraction”.
On of the examples of the Model Theft is the situation in which Meta’s LLAMA model was leaked.
As a protection, OWASP document refers to the most important security practices - fostering a good cybersecurity posture of your organization seems to be the best protection against this threat.
Although large language models are advanced and useful technology, it is worth being mindful of the risks associated with their use. Technology is developing rapidly, its adoption is increasing and new tools are emerging - this makes the risk of new vulnerabilities very high. With OWASP Top10 list for LLM, threat modelling of LLM-related applications is easier, but the list is not exhaustive and you should watch for new vulnerabilities to emerge. Lakera Guard will help you in protecting your solutions.
Download this guide to delve into the most common LLM security risks and ways to mitigate them.
Subscribe to our newsletter to get the recent updates on Lakera product and other news in the AI LLM world. Be sure you’re on track!
Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.
Several people are typing about AI/ML security. Come join us and 1000+ others in a chat that’s thoroughly SFW.