Cookie Consent
Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.
Read our Privacy Policy
Back

How to select the best machine learning models for computer vision?

Deep-dive into advanced comparison methods beyond standard performance metrics to build computer vision models that consistently perform over the long term.

Matthias Kraft
December 1, 2023
August 29, 2022
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

In-context learning

As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.

[Provide the input text here]

[Provide the input text here]

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Lorem ipsum dolor sit amet, line first
line second
line third

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Hide table of contents
Show table of contents

Given you have two machine learning models, which one would you pick and why? Every machine learning engineer encounters the decisions to choose an optimal combination of hyperparameters, model architectures, optimal experiments during development, and what data to collect, annotate, and train on. These decisions are more often than not a lot more challenging than expected, especially in the context of computer vision, which is the focus of this article.

Developers often make the wrong choice due to one tricky decision that can send the entire computer vision model tumbling like a set of dominoes. Once I was preparing an upcoming customer demo. I chose a model based on a couple of performance metrics, and it seemed to outperform its contender. I also visualized a few predictions on some image sequences, and it reconfirmed my model choice.

So based on the above pieces of evidence, I believed this must be the optimal model and thought of sending it to the production team. A couple of days later, we had the demo (thankfully, it was only an internal one), and looking at the prediction behavior, I quickly realized that the behavior pattern was completely off.

While performing well overall, it turned out that we had introduced a regression for the particular demo site. How come I did not catch this during my evaluation? Even more so, when I analyzed the contending computer vision model that got rejected right before going into production, it turned out that it did not show any performance regression. So it turned out my model evaluation methods were incomplete – to say the least – and we had to dig deeper.

How do we make better decisions when comparing computer vision models?

Since then, I have been doing a lot of work on model evaluation and comparison methods to avoid such pitfalls. Over the years, I have added the following techniques to my core set of evaluation and comparison methods:

1. Standard ML metrics

Any model comparison should include the following standard metrics:

  • PR curves, Precision, Recall, RMSE, etc (metrics relevant to the use case)
  • Training loss, validation loss, test loss (to assess overfitting behavior)
  • Model complexity (to consider potential runtime tradeoffs)

There are plenty of resources on the web and in textbooks on how to use and interpret these ML evaluation metrics to compare models. These model evaluation metrics are always an integral part of my evaluation process even though they are not nearly as comprehensive as required to get a good insight into the overall quality of your models.

2. Subgroup analysis and explicit data unit test

Standard ML metrics hide too much valuable information when deciding between multiple models (or generally when evaluating a model). Partially that is because they look at aggregate metrics over large datasets. So, they may not accurately reflect your business and product requirements.

For instance, if you are building a computer vision project on object detection for industrial inspection and aim to roll it out across different customer sites, you need to look at the model performance on each site (to avoid situations like the one I described above). To find out which model is best, you will want to check if the model performs equally well across all components that require an inspection as well.

To do this subgroup analysis and split the performance metrics, I tend to collect as much metadata (timestamp, customer site, camera model, image dimensions, etc.) as possible for each image. Another technique I use here is to build small regression test sets (10-50 images) to track the subset performance. These regression sets can include sensitive cases or specific scenarios I want to test but have no metadata available. Learn more about that here. I want to make sure that my model performs equally well on (combinations of) these subgroups.

Start comparing your computer vision models with MLTest today.

Get Started

3. Model robustness

Once your model is in production, it will inevitably encounter dynamic variations in the image input. How does the model respond to that? Even minor variations will throw your model off if you have overfitted your training and test data. To prevent this scenario, I ensure to explicitly test model robustness with varied images and check if the model output stays close to the original. For minimal testing, I execute the following:

  • Geometric variations: rotations, perspective changes, scaling, cropping, etc.
  • Lighting variations: global and local brightness and contrast changes, color changes, etc.
  • Image quality variations: noise, compression artifacts, blur, package losses, etc.

As a side note, knowing where your model is not robust significantly helps to select the data augmentations during training. In some sense, this is an easy test to see if your training pipeline is correct. It also supports as input to refining data collection and annotation.

“Better performing” machine learning models (based on the standard metrics above) often do not generalize better. They cannot grasp data beyond the available dataset, ignoring, or failing to correctly process variations in the input.

Gaining a good understanding of model robustness is a critical stage in selecting the optimum model.

4. Model biases/fairness

If you are building an application where biases could impact customer experience or safety, you should consider fairness metrics as part of your model comparison methods. One model may outperform another on high-level performance metrics but may include subtle predictive biases.

A recommended way to get started is to ensure that your datasets represent the operational use case. Depending on the application, you may also want to measure explicit fairness metrics such as equalized odds or predictive equality.

5. In-operation metrics

Production environments and configurations always add additional constraints to your computer vision application. Some that come to mind are as follows:

  • Memory footprint
  • Model Inference time
  • System latency
  • GPU/CPU utilization

For instance, you have to ask yourself if a model with twice the inference time is your preferred model to optimize for a 0.5% gain in performance.

Also, on-device performance may substantially differ from your training environment with a beefy GPU in the cloud. If you suspect a difference, model comparisons should consider on-device performance.

Comparing models with MLTest

Now, evaluating all these dimensions can become quite overwhelming. I’ve been there myself.

That’s why I’m excited that we recently introduced a neat model comparison feature in MLTest to help you get a more comprehensive view of your model. It tracks all the standard ML metrics, automatically assesses model robustness and biases, and does a subset analysis on your models. It even automatically identifies failure clusters where your model performs poorly, making it possible to create a much more comprehensive comparison.

Comparing Computer Vision Models with MLTest | Lakera AI

You can learn more about how MLTest can help you in comparing machine-learned computer vision models here, get started with MLTest right away, or get in touch with me at matthias@lakera.ai.

Conclusion

When comparing computer vision models, take the next step and include the above criteria in your evaluation. They will help you make better decisions and ultimately build better ML systems.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

Download Free

Explore Prompt Injection Attacks.

Learn LLM security, attack strategies, and protection tools. Includes bonus datasets.

Unlock Free Guide

Learn AI Security Basics.

Join our 10-lesson course on core concepts and issues in AI security.

Enroll Now

Evaluate LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Download Free

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Download Free

The CISO's Guide to AI Security

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Download Free

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Download Free
Matthias Kraft

The CISO's Guide to AI Security

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Free Download
Read LLM Security Playbook

Learn about the most common LLM threats and how to prevent them.

Download

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Understand AI Security Basics.

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Optimize LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Master Prompt Injection Attacks.

Discover risks and solutions with the Lakera LLM Security Playbook.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

You might be interested
min read
Machine Learning

Why ML testing is crucial for reliable computer vision.

Sounds like a lot of work? It used to be, but with the advent of artificial intelligence (AI) observability software, such assessments become as easy as training a new model.
Matthias Kraft
December 1, 2023
2
min read
Machine Learning

Stress-test your models to avoid bad surprises.

Will my system work if image quality starts to drop significantly? If my system works at a given occlusion level, how much stronger can occlusion get before the system starts to underperform? I have faced such issues repeatedly in the past, all related to an overarching question: How robust is my model and when does it break?
Mateo Rojas-Carulla
December 1, 2023
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.