3 Strategies for Making Your ML Testing Mission-Critical.

Testing machine learning systems is currently more of an art form than a standardized engineering practice. This is particularly problematic for machine learning in mission-critical contexts. This article summarizes three steps from our ML testing series that any development team can take when testing their ML systems.

Lakera Team
December 1, 2023
August 12, 2021

Testing machine learning (ML) systems is currently more of an art form than a standardized engineering practice. This is particularly problematic for machine learning in mission-critical contexts. In these use cases, strict performance guarantees and regulatory compliance are a must. The best engineering teams at companies like Tesla have built sophisticated testing infrastructure to ensure the reliability of their ML systems. Now it is time to make effective and systematic ML testing a reality for the rest of us–including smaller engineering teams–as well.

This article summarizes three steps from our ML testing series that any development team can take when testing their ML systems:

Specify your operational domain 📝

  1. Systematic testing is most effective in the context of an operational domain. An operational domain describes the “specific conditions under which a given [...] automation system is designed to function” [1]. It is a more compact representation of the environment in which the system will operate. This can also include what data it will be exposed to and its user interactions. Specification of the operational domain can be used to detect data bugs. It is also the starting point for establishing reliability guarantees for the whole system.

Stress-test your system 🦾

  1. With the operational domain specified, we can check the robustness of the system within the relevant conditions. As with traditional software systems, bugs often appear when a problematic input is presented to the system. Fuzz testing is a popular strategy that looks for these problematic inputs by randomly generating data points both inside and outside of the operational domain. In combination with metamorphic relations, it becomes a powerful tool that developers can use to ensure that their system performs well enough within the operational domain and degrades gracefully when presented with more challenging inputs.

Ensure your system performs when it really matters ✋

  1. The machine learning development process is inherently complex and iterative. Regression sets provide an effective tool for ensuring that your system really gets better with every iteration. They are a simple tool that can be used not only retroactively (e.g., when a bug has been found and you want to make sure that it does not reoccur) but also proactively (e.g., for creating test sets to actively probe system behavior when performance matters).

Adopting these three strategies is the first step to making ML testing more systematic and effective. They often provide a high return on investment. This is the case especially for smaller engineering teams that don’t have Tesla’s resources but that require strict performance guarantees and want to move through product development quickly and efficiently.

Lakera’s validation engine, MLTest, finds critical performance vulnerabilities in computer vision systems before they enter operation. Built with industry-leading AI and safety expertise, MLTest makes reliability a no-brainer for entire development teams. Get in touch if you want to learn more!

[1] Definition from ISO 21448: Road vehicles — Safety of the intended functionality.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Lakera Team
Read LLM Security Playbook
Learn about the most common LLM threats and how to prevent them.
Download
You might be interested
5
min read
Machine Learning

Free of bias? We need to change how we build ML systems.

The topic of bias in ML systems has received significant attention recently. And rightly so. The core input to ML systems is data. And data is biased due to a variety of factors. Building a system free of bias is challenging. And in fact, the ML community has long struggled to define what a bias-free or fair system is.
Lakera Team
December 1, 2023
6
min read
Machine Learning

Test machine learning the right way: Fuzz testing.

In this instance of our ML testing series, we discuss fuzz testing. We discuss what it is, how it works, and how it can be used to stress test machine learning systems to gain confidence before going to production.
Lakera Team
December 1, 2023
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.