Cookie Consent

Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.

Stress-test your models to avoid bad surprises.

Will my system work if image quality starts to drop significantly? If my system works at a given occlusion level, how much stronger can occlusion get before the system starts to underperform? I have faced such issues repeatedly in the past, all related to an overarching question: How robust is my model and when does it break?

Mateo Rojas-Carulla

October 20, 2023

Last updated:

November 13, 2024

On this page

Hide table of contents

Show table of contents

Degradation of model performance as package loss increases. With increased pixel dropout (x-axis), we can see the percentage of the dataset on which the model significantly breaks. With mild package loss, the model enters into a higher risk zone (yellow) as more than 20% of the predictions are unstable.

As developers, we often face daunting questions as we build a model toward production. For instance:

Will my system work if image quality starts to drop significantly?
If my system works at a given occlusion level, how much stronger can occlusion get before the system starts to underperform?

I have faced such issues repeatedly in the past, all related to an overarching question: How robust is my model and when does it break?

We all know that building production-level computer vision systems is challenging. The number of factors that can influence input images or videos in the real world is daunting. As such, gathering evidence that the systems we built will work under these varying factors is key and a major challenge for that final push to production.

Ultimately, we as developers carry the burden of proof.

Stress-testing provides the answers.

A major tool at our disposal to bring transparency to how a computer vision system will perform in the real world and provide answers to the questions above is stress-testing. Systematically answering such questions (for example, how occluded do objects have to be before my systems break) provides two major advantages:

1. If the system’s robustness is unacceptably low (for example, the system is highly affected by increased occlusion), gathering the right data to increase robustness becomes easier to strategize (by prioritizing images with highly occluded objects). This includes generating the right synthetic data and augmentations.

2. It helps communicate to the product’s users its limitations, ultimately ensuring transparency and safer use.

How does it work?

The fundamental insight relies on the idea of metamorphic relations. Some visual changes result in clear, predictable effects in the image’s annotations. For example, a rotated dog is still a dog, or a horizontal flip on a gauge reading should result in a symmetric reading. This allows us to measure performance while exploring such visual changes using techniques from the fuzz testing world. To learn more about these, take a look at our blog post on metamorphic relations and fuzz testing.

Stress-testing is a key functionality of Lakera’s MLTest. Our stress-testing capabilities push the boundary and bring you answers to the questions discussed above. If you are interested to see what MLTest has to say about the breaking points of your systems, do reach out!

Mateo Rojas-Carulla

GenAI Security Preparedness
Report 2024

Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.

Free Download

Why ML testing is crucial for reliable computer vision.

Sounds like a lot of work? It used to be, but with the advent of artificial intelligence (AI) observability software, such assessments become as easy as training a new model.

Matthias Kraft

November 13, 2024

min read

•

Machine Learning

Free of bias? We need to change how we build ML systems.

The topic of bias in ML systems has received significant attention recently. And rightly so. The core input to ML systems is data. And data is biased due to a variety of factors. Building a system free of bias is challenging. And in fact, the ML community has long struggled to define what a bias-free or fair system is.

Lakera Team

November 13, 2024

Activate
untouchable mode.

Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Book a demo Start for free

Join our Slack Community.

Several people are typing about AI/ML security.  Come join us and 1000+ others in a chat that’s thoroughly SFW.

Join Lakera Momentum Slack

Stress-test your models to avoid bad surprises.

Stress-testing provides the answers.

How does it work?

Unlock Free AI Security Guide.

Explore Prompt Injection Attacks.

Learn AI Security Basics.

Evaluate LLM Security Solutions.

Uncover LLM Vulnerabilities.

The CISO's Guide to AI Security