Cookie Consent
Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.
Read our Privacy Policy
Back

Why ML testing is crucial for reliable computer vision.

Sounds like a lot of work? It used to be, but with the advent of artificial intelligence (AI) observability software, such assessments become as easy as training a new model.

Matthias Kraft
December 1, 2023
April 26, 2022
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

In-context learning

As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.

[Provide the input text here]

[Provide the input text here]

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Lorem ipsum dolor sit amet, line first
line second
line third

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Hide table of contents
Show table of contents
Photo by ThisisEngineering RAEng on Unsplash

Building computer vision (CV) products is fun and exciting, it’s magical when you get the first demos working and you can see the results with your own eyes. However, it’s also tedious and notoriously difficult to bring computer vision models to production. The exciting phase happens at the beginning of any new project. More often than not, you have little data to train and test models with and rely on pre-trained open-source models to make your first steps.

At that point, your primary focus is probably to get a proof-of-concept (POC) going that demonstrates the performance of your CV model on a small test data set through your favorite metric (e.g. Precision, Recall, mAP, ROC curve, etc). That’s totally understandable and the right thing to do. However, it’s also where it gets dangerous quickly.

The real work begins after the POC; the POC’s primary purpose is to assess the amount of work involved to build a production model, guide what is needed for that and estimate the chance of success. I speak from experience when I say hitting the target metric on a test set is the easy bit. We all know that. Yet, too often an over-reliance on this one, two, or three aggregate metrics has led me to:

  • Be overly optimistic about the production performance of my CV model.
  • Overpromise on what can be delivered in a certain time frame and waste resources.
  • Collect the wrong type of data.

In short, even at the POC stage, we have to do a rigorous assessment of the status quo of our machine learning models and test them properly. What should that involve? It depends on the product, but at the very least should include (in addition to your target evaluation metrics):

  • Testing ML models for robustness through stress and unit tests.
  • Performing regression testing to ensure we don't re-introduce bugs.
  • Testing the input data for representativity, bias, and label correctness.
  • A model evaluation based on metadata to learn precisely where the model fails and where it works.

These processes used to be a lot of work, but with the advent of dedicated software for machine learning testing, like Lakera’s MLTest, these assessments become as easy as training a new model.

The rewards are immediate, with insights to accelerate development and tedious processes automated, there’s a much higher chance of success in production.

Most AI software projects are stuck in the prototyping trap or underperform in operation.

An analysis by Gartner in 2022 found that “only 53% of projects make it from artificial intelligence (AI) prototypes to production”.

Why is this the case?

Businesses executives wonder if they should end the AI computer vision project because it doesn't make it to market.
Getting stuck in the prototype trap leads to more than half of AI applications never making it to production.

Developers don’t pay attention to model robustness.

All too often developers don’t systematically test the robustness of a model or only do it at the end of development. It’s a widespread (mis-)belief that adding a few data augmentations (e.g. Gaussian noise, horizontal flips, etc) will fix all robustness issues.

Data augmentations can fix some issues, but not all. To know which augmentations to add and what data to collect, it’s key to first understand the robustness of your models and test for that as part of your development as early as possible.

The quality of the data is insufficient.

There is never enough of the right data. Never. At the beginning of a new project, this is particularly true, so you may resort to using any open-source datasets you can get your hands on and become creative in other ways to make more training and test data. Unfortunately, the data you end up with will not be exactly representative of your use case, it will likely contain unwanted biases and possibly dangerous correlations that will lead to your model taking shortcuts.

As with robustness, it’s key to test your models and data against these issues as early as possible; it will help you collect the right data going forward and alert you of potential issues in the final model. Certain applications, like MLTest, help to test for data robustness and where you can proactively procure data that is right for your particular application.

Business stakeholders find it difficult to understand the limitations of the product.

We’ve all been there: a product manager, CEO, CTO or similar has asked if your computer vision model will work under certain conditions. What do you say? “Ehm, probably?”. If you are unsure about the performance of your model, stakeholders one level removed from the technology will be lost, and so will be the customers they sell to.

This can only lead to misunderstandings and frustration for all involved. Canceled projects and canceled contracts. Once you continuously evaluate your machine learning models and data across a wide range of metrics and scenarios, you will be in a much better position to communicate the findings too.

Does your AI detect speed signs? Yup, we've tested that. What if the AI saw another colour? Erm possibly? If there are branches in front? Well...

There are infinite tests to run and only so much time until the next meeting with your PM. You think you’ve prepared for those use-case questions until…

Test early and reap the rewards.

Avoid many of the mistakes we made and start testing early. If you want to learn more about how to test machine learning models take a look at this fantastic overview paper. Or read our more computer vision focused guide on ML testing.

Lakera’s MLTest equips every computer vision development team with a world-class testing infrastructure. Our product gives full visibility into your model's performance and the quality of your data – automatically as part of existing development processes.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

Download Free

Master Prompt Injection Attacks.

Learn LLM security, attack strategies, and protection tools. Includes bonus datasets.

Unlock Free Guide

Learn AI Security Basics.

Join our 10-lesson course on core concepts and issues in AI security.

Enroll Now

Optimize LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Download Free

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Download Free

Understand AI Security Basics.

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Download Free

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Download Free
Matthias Kraft
Read LLM Security Playbook

Learn about the most common LLM threats and how to prevent them.

Download
You might be interested
15
min read
Machine Learning

AI Observability: Key to Reliable, Ethical, and Trustworthy AI

AI observability offers deep insights into AI decision-making. Understand how models work, detect bias, optimize performance, and ensure ethical AI practices.
Brain John Aboze
March 13, 2024
min read
Machine Learning

3 Strategies for Making Your ML Testing Mission-Critical.

Testing machine learning systems is currently more of an art form than a standardized engineering practice. This is particularly problematic for machine learning in mission-critical contexts. This article summarizes three steps from our ML testing series that any development team can take when testing their ML systems.
Lakera Team
December 1, 2023
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.