Test machine learning the right way: Metamorphic relations.

As part of our series on machine learning testing, we are looking at metamorphic relations. We’ll discuss what they are, how they are used in traditional software testing, what role they play in ML more broadly and lastly, how to use them to write great tests for your machine learning application.

Lakera Team
December 1, 2023
July 20, 2021

In this part of our machine learning testing series, we’ll look at metamorphic relations — a technique used to multiply your available data and labels. We discuss how they can be used for machine learning model evaluation. Metamorphic relations help extend the test coverage of your ML (machine learning) system beyond what can be achieved through normal data collection. This testing series has previously covered multiple aspects around how to evaluate machine learning models such as testing for data bugs and regression testing.

The test oracle problem.

The test oracle problem is not specific to ML, and it is well known from traditional software [1]. It refers to determining the correct test output for a given test input.

Let’s look at an example from medical imaging. Imagine that you are building an ML system for medical imaging that is used as a diagnostic tool for cancer histopathology. The input = images of histopathology samples. The output = cancer or no cancer diagnosis.

The test oracle problem presents itself because you have some input image data, but you don’t know the label (cancer/no cancer).

This is solved by having the image annotated. You can send these images to histopathologists, who can play the role of the test oracle by adding a label to each sample image. The problem is that these images are scarce to begin with and the ones that you do have will be expensive to annotate.

The combinatorial number of scenarios needed for thorough machine learning evaluation requires more data and labels than can be realistically collected. For example, relevant scenarios become too large when looking at variations in the color of the image, the type of microscope used to take the image, the zoom level, etc. As a result only a part of relevant conditions can be tested for, leading to insufficient test coverage.

In come metamorphic relations. Take the image that you already have and rotate it. You could then send this rotated image to be re-annotated to solve the test oracle problem. But because you know that the label for the rotated image is still cancer, you don’t need to.

That’s how metamorphic relations can contribute to solving the oracle problem.

What are metamorphic relations?

Metamorphic relations are a great way of extending the test coverage of your ML application. A metamorphic relation [2]:

“Refers to the relationship between the software input change and output change.”

To return to the example of a square function, an easily tested metamorphic relation (ignoring numerical issues) is:

f(-input) = f(input)

This is a powerful concept that can be applied to ML as well! Two classes of metamorphic relations that are well known in computer vision are:

a) Image augmentations (e.g., rotation) that affect the label in a known way and act as a data/label multiplier;

b) Using temporal relations in video sequences (e.g., two successive image frames in a 30Hz video sequence are likely similar) that act as supervisory signals. Both have been applied in the context of (self-)supervised learning to create more robust ML models [3].

How can we leverage this concept for model testing in machine learning?

Example: Using metamorphic relations for medical image testing.

We illustrate the use of metamorphic relations when looking at how to test machine learning models for our histopathology example. We can make use of metamorphic relations to write model unit tests and increase the test coverage in our ML testing suites.

An example of test specifications based on metamorphic relations.

We’d certainly expect this ML system to work if the input image is rotated by 180 degrees. Shifts in the color intensity of the image should also not change the system output. Neither should slightly out-of-focus samples.

These problem insights or, in this context, metamorphic relations can be used to create clear test specifications and to build these model unit tests. Not only does this multiply your available test data but it also ensures that your ML model behaves according to the specifications via machine learning unit testing.

So, why bother augmenting your test data if you’re already adding them to your training set? Truth be told, there is no guarantee that adding these augmentations to your training set ensures the desired behavior of your trained model. We observed this on state-of-the-art object detection models which are not robust to augmentations used during training. But testing your model for desired behavior gives confidence for certain inputs and will likely discover and prevent many ML model bugs.

Not convinced? Similar metamorphic relations were applied to the testing of neural networks for autonomous driving by Tian et al. in DeepTest [4]. They found thousands of erroneous (and sometimes grave) behaviors in state-of-the-art deep neural networks for self-driving cars.

To summarize, metamorphic relations are a great way to thoroughly test your ML system. In addition to regression tests, they should not be forgotten in your development cycles when testing ML models. Our follow-up article on fuzz-testing provides illustrations on how to leverage the concept of metamorphic relations to stress-test ML models.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Lakera Team
Read LLM Security Playbook
Learn about the most common LLM threats and how to prevent them.
You might be interested
min read
Machine Learning

Test machine learning the right way: Detecting data bugs.

In this second instance of the testing blog series, we deep dive into data bugs: what do they look like, and how can you use specification and testing to ensure you have the right data for the job?
Mateo Rojas-Carulla
December 1, 2023
min read
Machine Learning

Why testing should be at the core of machine learning development.

AI (artificial intelligence) is capable of helping the world scale solutions to our biggest challenges but if you haven’t experienced or heard about AI’s mishaps then you’ve been living under a rock. Coded bias, unreliable hospital systems and dangerous robots have littered headlines over the past few years.
Lakera Team
December 1, 2023
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.