Back

Stress-test your models to avoid bad surprises.

Will my system work if image quality starts to drop significantly? If my system works at a given occlusion level, how much stronger can occlusion get before the system starts to underperform? I have faced such issues repeatedly in the past, all related to an overarching question: How robust is my model and when does it break?

Mateo Rojas-Carulla
December 1, 2023
July 7, 2022
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

In-context learning

As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.

[Provide the input text here]

[Provide the input text here]

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Lorem ipsum dolor sit amet, line first
line second
line third

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Hide table of contents
Show table of contents
Degradation of model performance as package loss increases. With increased pixel dropout (x-axis), we can see the percentage of the dataset on which the model significantly breaks. With mild package loss, the model enters into a higher risk zone (yellow) as more than 20% of the predictions are unstable.

As developers, we often face daunting questions as we build a model toward production. For instance:

  • Will my system work if image quality starts to drop significantly?
  • If my system works at a given occlusion level, how much stronger can occlusion get before the system starts to underperform?

I have faced such issues repeatedly in the past, all related to an overarching question: How robust is my model and when does it break?

We all know that building production-level computer vision systems is challenging. The number of factors that can influence input images or videos in the real world is daunting. As such, gathering evidence that the systems we built will work under these varying factors is key and a major challenge for that final push to production.

Ultimately, we as developers carry the burden of proof.

Stress-testing provides the answers.

A major tool at our disposal to bring transparency to how a computer vision system will perform in the real world and provide answers to the questions above is stress-testing. Systematically answering such questions (for example, how occluded do objects have to be before my systems break) provides two major advantages:

1. If the system’s robustness is unacceptably low (for example, the system is highly affected by increased occlusion), gathering the right data to increase robustness becomes easier to strategize (by prioritizing images with highly occluded objects). This includes generating the right synthetic data and augmentations.

2. It helps communicate to the product’s users its limitations, ultimately ensuring transparency and safer use.  

How does it work?

The fundamental insight relies on the idea of metamorphic relations. Some visual changes result in clear, predictable effects in the image’s annotations. For example, a rotated dog is still a dog, or a horizontal flip on a gauge reading should result in a symmetric reading. This allows us to measure performance while exploring such visual changes using techniques from the fuzz testing world. To learn more about these, take a look at our blog post on metamorphic relations and fuzz testing.

Stress-testing is a key functionality of Lakera’s MLTest. Our stress-testing capabilities push the boundary and bring you answers to the questions discussed above. If you are interested to see what MLTest has to say about the breaking points of your systems, do reach out!

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Mateo Rojas-Carulla
Read LLM Security Playbook

Learn about the most common LLM threats and how to prevent them.

Download
You might be interested
15
min read
Machine Learning

Generative AI: An In-Depth Introduction

Explore the latest in Generative AI, including groundbreaking advances in image and text creation, neural networks, and the impact of technologies like GANs, LLMs, and more on various industries and future applications.
Deval Shah
December 1, 2023
6
min read
Machine Learning

Test machine learning the right way: Fuzz testing.

In this instance of our ML testing series, we discuss fuzz testing. We discuss what it is, how it works, and how it can be used to stress test machine learning systems to gain confidence before going to production.
Lakera Team
December 1, 2023
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.