Data Loss Prevention in the Age of Generative AI (with Lakera's Insights)

Learn about data loss prevention in the context of generative AI. Explore some best practices to ensure error-free DLP implementation.

Haziqa Sajid
February 1, 2024
February 1, 2024
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

In-context learning

As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.

[Provide the input text here]

[Provide the input text here]

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Lorem ipsum dolor sit amet, line first
line second
line third

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Since the launch of ChatGPT back in 2022, generative AI (GenAI) has taken the world by storm. There is exponential growth in the GenAI market as most IT companies are jumping on the AI bandwagon. A Bloomberg report estimates the GenAI market to grow to 1.3 Trillion over the next 10 years. As exciting as this may be, many concerns still come to mind. Are we going too fast? Is the world ready for this evolution? How do we regulate this influx of artificial intelligence?

Perhaps the biggest concern is the secure use of these applications.

As AI primarily relies on data for training, data loss prevention (DLP) solutions have become a necessity. DLP implementations help protect sensitive data by enforcing strict policies regarding its usage and mobility. In the context of GenAI, they protect against vulnerabilities like prompt injections or data poisoning, and help evaluate GenAI applications against safety protocols.

This article will discuss the importance of data loss prevention in the GenAI ecosystem, how modern solutions cater to GenAI problems, and mention some best practices for implementing a DLP infrastructure optimally.


Hide table of contents
Show table of contents

Introduction to Data Loss Prevention

Data Loss Prevention is a set of tools and practices that ensure the secure storage, distribution, and usage of sensitive data. These tools and practices define the organization's overall security strategy and prevent unauthorized data transfers and use of information by cyber attackers.

Types of DLP Systems (Source)

The entire DLP strategy begins with classifying sensitive information, which is then closely monitored for illicit activities. The monitoring ensures that all protocols are followed, e.g., data is only accessed by authorized personnel, it remains on the authorized device/server, and there are no unauthorized modifications. DLP issues alerts to the concerned authorities in case of a policy breach, followed by protective measures, including data encryption, access restriction, and visual markings.

DLP also monitors data for regulatory compliance, such as GDPR, HIPPA, GLB, and PCI DSS. Its monitoring policies identify weak areas and ensure that any changes to the data do not violate regulatory compliance.

The DLP strategy aims to protect an organization's assets and its customers' trust and help uphold its brand image. The traditional DLP implementation has the following use cases:

  • Personal Information Protection: Organizations often collect customer's personally identifiable information (PII), such as health records, credit card details, and addresses. These companies are bound by law and regulatory compliance, such as the General Data Protection Regulation (GDPR), to safeguard all such sensitive assets. DLP identifies all such data, flags it, and monitors all activities around these assets for protection.
  • IP Protection: Intellectual property (IP) includes sensitive information such as state and trade secrets. This sort of data can damage a business’s financial stability and reputation and lead to loss worth Millions of dollars.
  • Data Visibility: DLP solutions provide organizations complete visibility into their data assets' storage, usage, and movement. It monitors API endpoints and cloud repositories for data-related activity. This allows the business to identify and address potential risks, enforce data security policies, and improve overall data governance
  • Minimize Data Exposure: DLP solutions can detect sensitive data, flag information, and apply safeguards like restricting data movement and redacting confidential data.

Trends Driving DLP Adoption

Modern organizations are heavily invested in improving data security and minimizing losses caused by data exfiltration. Many are pursuing data loss prevention solutions to safeguard company data and create a secure environment. Let’s explore the factors fueling this growth.

Advancing Cyberattacks

Data breach techniques and cyberattacks are evolving quickly as modern hackers use advanced tools to tamper with an organization's data. Likewise, DLP tools are constantly updated with state-of-the-art solutions to protect data from threats. Data loss prevention helps combat all evolving cyber threats, making it the go-to choice as a complete data protection solution.

Evolving Compliance and Auditing Requirements

Data protection requirements constantly change, and organizations must conform to the new policies. DLP can enforce policies that ensure compliance with the latest global regulatory laws, protecting organizations from legal repercussions.

Dynamic Data Collection and Storage

Data is stored in various locations across various platforms in growing businesses, such as cloud servers, online data stores, or on-premise systems. DLP keeps track of data stored across all these destinations. It monitors all activities around the data, such as access privileges and movement, and saves organizations from the trouble of keeping a manual check.

Rise of GenAI

The unprecedented adoption of GenAI accompanies critical threats to users' personal information. These AI models are trained on extensive datasets that contain information from all domains. The latest data compliances demand that any data fed to the GenAI models not include sensitive information such as personal health records or financial data. DLP controls the flow of data and filters data streams to avoid any data leaks due to GenAI training.

DLP in the Context of Generative AI

Applications like ChatGPT and BARD have found various innovative use cases, including writing professional emails, proofreading written content, and code reviews.

However, most users don’t realize that any data or information fed to the model as prompts are retained and used for further training and improvement. 

Imagine an employee at a multi-national organization pasting source code into ChatGPT to resolve errors. This code is now part of the GPT model and is susceptible to leakage with the right prompts. This example is not a speculation but a real incident that happened with Samsung when some employees used ChatGPT for code review.

Moreover, GenAI applications pose some unique challenges to existing DLP solutions. These include

  • Multi-Platform GenAI Applications: GenAI applications are being developed in various forms, including web applications, desktop integrations, and browser plug-ins. This proliferation opens up multiple points for data breaches and leakages, and DLP solutions must evolve to secure all such use cases.
  • Unintended Data Sharing: In most cases, users sharing sensitive information with GenAI applications are unaware of its harm. There is little that conventional DLP solutions can do about these unintentional data leaks other than restricting absolute control from the entire application.
  • Model Contamination: Most DLP solutions are intended to apply policies and restrictions within an organization. However, they have no control over public GenAI models therefore, any vulnerabilities in such models will continue to be a threat to its users. 
  • New Applications Every Day: New GenAI applications are coming out daily. This challenges DLP vendors, who must track and include these new releases in their security infrastructure. Every new application brings a new threat which will require a unique solution.

However, despite the challenges, GenAI adoption is inevitable. Many industries, including the financial sector and healthcare, are quickly integrating GenAI for various uses.

According to a Netskope report, 19% of the financial sector and 21% of the healthcare industry use data loss prevention. Moreover, 26% of IT companies are using DLP to reduce the risk of GenAI.

Lakera and DLP

The growing concerns surrounding GenAI's data-related vulnerabilities have security organizations on their toes. Authorities are constantly pushing security teams to develop policies and security measures to mitigate the data risks posed by these models.

Lakera has always placed its user's security as a top priority and has actively worked to develop a secure environment for all clients. Our security solution has evolved to cover all risks and vulnerabilities highlighted in the Open Web Application Security Project (OWASP) Top 10 for Large Language Models (LLMs). Here’s how Lakera deals with some of the most prominent LLM data breaches.

Prompt Injection

Lakera specializes in addressing prompt injections and jailbreaks in text and visual formats. Utilizing a growing database of over 30 million attacks, our API assesses and provides immediate threat assessments for conversational AI applications. Our Red Team actively tests models and products and explores publicly available jailbreaks to stay updated on potential threats.

Lakera understands that prompt injection opens avenues for other vulnerabilities and cannot interconnect with security design flaws, such as insecure plugins. For this purpose, our Red team constantly tests our systems to ensure they are robust to all evolving attacks.

Conflicting Response Using Prompt Injection (Source)

Training Data Poisoning

Training Data Poisoning is the act of manipulating the LLM training data such that it negatively influences its responses. This could lead to the model generating biased responses towards racial groups or sects or follow-through with commands that violate company policies.

Lakera Red uses a two-fold approach to protect LLMs from poisoning. It first uses pre-training evaluation, which evaluates training data to identify suspicious elements. The second is the development of protective measures for running systems. These measures monitor the system's responses for suspicious or illicit outputs and check whether the model has been attacked.

Model Denial of Service (MDoS)

The MDoS attack throttles an LLM by bombarding it with numerous requests simultaneously. This overloads the system and prevents the LLM from responding to any user. These attacks are carried out by bots programmed to mimic humans and send queries to the model.

Lakera monitors user activity to discern between an actual query and a DoS attack. Our solution protects your system by blocking suspicious users and providing the option to block specific API tokens to prevent unwanted requests. These measures prevent system overload and potential downtime and ensure legitimate users access the model.

Supply Chain Vulnerabilities

An LLM supply chain includes all tools, resources, and data utilized in building the model. Each component carries the risk of infiltration and must be assessed for any vulnerabilities that could potentially harm the final model.

Lakera Red thoroughly examines various components, including Python code, model weights, plugins, and open-source software. Using carefully crafted prompts, we can determine whether the model aligns with the set policies and judge its safety and reliability.

Moreover, Lakera’s advanced security functionality protects users' personal information. Lakera guard protects against prompt leakage, preventing sensitive information from being passed into prompts. It also provides strict access control so LLM does not serve critical information to unauthorized users.

Advanced DLP Solutions and Future Directions

The IT sector has been quick in adopting Generative AI for various applications, and today we see it implemented across multiple devices and platforms.

Even desktop applications like Adobe Photoshop are showcasing generative capabilities. While this is an impressive leap, it exponentially raises the risk of data breaches as security solutions must scan every single bit of data to enforce policies. 

Vendors are implementing advanced DLP solutions that cater to the evolving data ecosystem. Some key implementations include:

  • Tracking GenAI Applications: DLP vendors constantly track, identify, and classify GenAI applications to enforce data flow policies. The vendors then define access control for these applications, ranging from limited access to complete cut-off. 
  • Using AI for Information Classification: Modern DLP solutions utilize machine learning techniques like natural language processing (NLP) and Computer Vision (CV) to understand, tag, and classify data as it moves across an organization. This provides an automated method of sensitive data classification like credit card numbers or API keys. 
  • Specialized Policies for Data Mobility: GenAI’s easy accessibility makes it challenging to implement conventional security policies. Employees can copy and paste source codes for review or confidential emails for proofreading onto web platforms. To tackle this, DLP solutions are implementing stricter policies. Some companies have implemented policies for blocking copy-paste actions for sensitive information so it may not reach the GenAI application.
  • Threat Detection and Education: Many DLP vendors can identify the potential risks associated with a GenAI application. These risks are presented to the user along with countermeasures to educate users regarding the safe use of these applications.

The Way Forward

This is just the beginning of GeAI, and as AI evolves, the applications will further complicate, utilizing every piece of data they can find.

Moving forward, we will require even stricter control over data mobility, access, and utilization to ensure a secure environment.

Robust policies must be devised and templated specifically for GenAI use.

Additionally, guidelines must be introduced on creating and deploying the GenAI models themselves. These will enforce compliance against using sensitive information during the training period and introduce measures to tackle cases where users may input potentially confidential data.

Moreover, as we are moving towards an “artificial” world, one possibility for DLP is to integrate GenAI within security solutions. These models' understanding and generative capabilities can be used to identify threats and generate a policy framework to safeguard user data.

Lastly, the immense responsibility falls on organizations aiming to utilize GenAI in their workflows. Similar to cybersecurity training,  companies will also have to introduce GenAI training so employees may learn the safe use of these tools.

Best Practices for Implementing DLP

Implementing a DLP solution can be challenging, especially considering the evolving data landscape and the various vendors available.

Here are a few best practices organizations can follow to ensure a smooth deployment and long-term protection.

DLP Best Practices (Source)

Determine Your DLP Goal

Understanding the kind of data you are dealing with and what services are required is vital. DLP solutions have different architectures for various information types such as Intellectual Property, Source Codes, Images, etc. Defining an end goal will help filter relevant vendors and select the optimal solution for your needs.

Secure Executive Buy-In

While it may seem that a DLP solution is a security-only decision, it is imperative to secure executives' confidence.

You must guide top-level management, including the Chief Technology Officer (CTO) and Chief Finance Officer (CFO), on the importance of DLP and how it will help relieve their pain points, such as easy tracking of sensitive information.

This will help get budget approvals, speed up implementation, and establish a security culture throughout the company.

Discover and Classify Sensitive Information

Modern DLP solutions can identify and classify confidential and critical Information such as personal details, intellectual property, and financial records.

However, it is better to list file locations and endpoints requiring protection policies manually as well. This would help enhance the DLP capabilities and the security infrastructure.

Set an Evaluation Criteria

Before opting for a DLP solution, defining criteria to select the ideal vendor is best. A few questions you can ask are:

  • What platforms and operating systems (Windows, Mac, Linux?) do they support?
  • What kind of protection do you need from the vendor? Do you need access restrictions or document tracking?
  • What functionalities does the vendor offer? Do they have support for in-house and cloud deployments? Do they encrypt data with state-of-the-art algorithms? Etc.
  • Does the vendor specialize in the domain you deal with? E.g., Vendors that understand medical data are more likely to have experience with HIPPA compliances and implementing policies that compliment the nature of the data.
  • How seamlessly does the vendor's solution integrate with your existing infrastructure?

Conduct Regular Security Audits

Regular security and compliance audits help detect anomalies in the DLP solution and ensure that all information is secure. These also ensure the implemented solution conforms to changing regulatory requirements and tackles evolving cyber threats.

Document DLP Policies

All the policies established under the DLP solution must be thoroughly documented. This documentation provides a sanity check over what areas are covered and makes onboarding new employees easier. Moreover, it acts as a reference when the solution is to be potentially upgraded in the future.

Keep Your System Up-To-Date

All professional software and system vendors regularly roll out updates. These updates include improved security patches and algorithms to tackle modern data threats. You must make sure that all softwares, including the operating system and daily work applications such as Microsoft Office, are kept up-to-date.

Arrange Training for Employees

Employees must be trained for two types of scenarios. First, they must understand the importance of data security and compliance, especially within the GenAI landscape. Secondly, they need to be educated on the proper use of the DLP solution so that the business can get proper use from its investment.

Constantly Refine Policies

DLP is not an implement-once type of solution. As data infrastructure evolves, you must constantly refine your security policies. This may include stricter control over data movement, refined data access restrictions, and access control over GenAI applications. The new policies must be documented and relayed to employees, and standardized tests must be conducted to ensure their effectiveness.

Key Takeaways

Amidst the data-driven technological revolution, information security and data threats have become prominent and caused major business losses. Data loss prevention (DLP) refers to a range of technologies and inspection techniques designed to locate, understand, and classify critical data and enforce security policies for its protection.

This article discussed the emergence of DLP as a necessity, advanced DLP solutions, and the best practices for DLP implementation. Here’s what we learned.

  • DLP solutions track all sensitive information and enforce guardrails to protect from any data leaks.
  • The lifecycle of a DLP implementation begins with identifying sensitive data, tracking its movement and usage across the industry, issuing alerts, and enforcing restrictions if any asset is found to be breaching a policy.
  • DLP allows you to define user access, filter data streams, restrict the mobility of files (whether in-house or on consumer cloud storage service), provide organization-wide visibility of crucial data, and monitor data endpoints to ensure compliance.
  • DLP also ensures compliance with global regulations such as HIPPA, GDPR, and GLB.
  • In this GenAI era, DLP implementations are required to provide more sophisticated control over sensitive information and the use of GenAI applications.
  • Lakera offers market-leading data security solutions such as Lakera Guard and Lakera AI.
  • Our security platform offers solutions to OWASP's Top 10 LLM threats, including prompt injection, Model Denial of Service, Training Data Poisoning, and Supply Chain Vulnerability.

You must follow certain best practices to gain the most out of your DLP platform, these include

  • Understanding your DLP objective
  • Thoroughly evaluating DLP vendors
  • Educating in-house staff
  • Continuously refining policies

That has become the key driving force behind most modern technological innovations like artificial intelligence. With increasing data applications, selecting the optimal DLP solution is vital.

Lakera is an industry-leading AI security platform specializing in securing modern GenAI tools like LLMs. Our extensive database of over 100,000 threats helps power the Lakera Guard and Lakera Red applications and protects against adversarial attacks, data leakage, and LLM vulnerabilities.

It takes less than five minutes to get started with the Lakera platform, and LLM protection is deployed with as little as one line of code. To learn more, create a free account and get started today.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Haziqa Sajid
Read LLM Security Playbook

Learn about the most common LLM threats and how to prevent them.

You might be interested
min read
AI Security

Remote Code Execution: A Guide to RCE Attacks & Prevention Strategies

RCE attacks aren't just for traditional systems. Learn what they are, how this threat targets AI models, and the security measures needed in the modern digital landscape.
Deval Shah
February 16, 2024
min read
AI Security

AI Security with Lakera: Aligning with OWASP Top 10 for LLM Applications

Discover how Lakera's security solutions correspond with the OWASP Top 10 to protect Large Language Models, as we detail each vulnerability and Lakera's strategies to combat them.
David Haber
December 21, 2023
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.