Back

3 Strategies for Making Your ML Testing Mission-Critical.

Testing machine learning systems is currently more of an art form than a standardized engineering practice. This is particularly problematic for machine learning in mission-critical contexts. This article summarizes three steps from our ML testing series that any development team can take when testing their ML systems.

Lakera Team
December 1, 2023
August 12, 2021
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

In-context learning

As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.

[Provide the input text here]

[Provide the input text here]

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Lorem ipsum dolor sit amet, line first
line second
line third

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Hide table of contents
Show table of contents

Testing machine learning (ML) systems is currently more of an art form than a standardized engineering practice. This is particularly problematic for machine learning in mission-critical contexts. In these use cases, strict performance guarantees and regulatory compliance are a must. The best engineering teams at companies like Tesla have built sophisticated testing infrastructure to ensure the reliability of their ML systems. Now it is time to make effective and systematic ML testing a reality for the rest of us–including smaller engineering teams–as well.

This article summarizes three steps from our ML testing series that any development team can take when testing their ML systems:

Specify your operational domain 📝

  1. Systematic testing is most effective in the context of an operational domain. An operational domain describes the “specific conditions under which a given [...] automation system is designed to function” [1]. It is a more compact representation of the environment in which the system will operate. This can also include what data it will be exposed to and its user interactions. Specification of the operational domain can be used to detect data bugs. It is also the starting point for establishing reliability guarantees for the whole system.

Stress-test your system 🦾

  1. With the operational domain specified, we can check the robustness of the system within the relevant conditions. As with traditional software systems, bugs often appear when a problematic input is presented to the system. Fuzz testing is a popular strategy that looks for these problematic inputs by randomly generating data points both inside and outside of the operational domain. In combination with metamorphic relations, it becomes a powerful tool that developers can use to ensure that their system performs well enough within the operational domain and degrades gracefully when presented with more challenging inputs.

Ensure your system performs when it really matters ✋

  1. The machine learning development process is inherently complex and iterative. Regression sets provide an effective tool for ensuring that your system really gets better with every iteration. They are a simple tool that can be used not only retroactively (e.g., when a bug has been found and you want to make sure that it does not reoccur) but also proactively (e.g., for creating test sets to actively probe system behavior when performance matters).

Adopting these three strategies is the first step to making ML testing more systematic and effective. They often provide a high return on investment. This is the case especially for smaller engineering teams that don’t have Tesla’s resources but that require strict performance guarantees and want to move through product development quickly and efficiently.

Lakera’s validation engine, MLTest, finds critical performance vulnerabilities in computer vision systems before they enter operation. Built with industry-leading AI and safety expertise, MLTest makes reliability a no-brainer for entire development teams. Get in touch if you want to learn more!

[1] Definition from ISO 21448: Road vehicles — Safety of the intended functionality.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Lakera Team
Read LLM Security Playbook

Learn about the most common LLM threats and how to prevent them.

Download
You might be interested
15
min read
Machine Learning

Generative AI: An In-Depth Introduction

Explore the latest in Generative AI, including groundbreaking advances in image and text creation, neural networks, and the impact of technologies like GANs, LLMs, and more on various industries and future applications.
Deval Shah
December 1, 2023
min read
Machine Learning

Your validation set won’t tell you if a model generalizes. Here’s what will.

As we all know from machine learning 101, you should split your dataset into three parts: the training, validation, and test set. You train your models on the training set. You choose your hyperparameters by selecting the best model from the validation set. Finally, you look at your accuracy (F1 score, ROC curve...) on the test set. And voilà, you’ve just achieved XYZ% accuracy.
Václav Volhejn
December 1, 2023
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.