Cookie Consent

Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.

Free of bias? We need to change how we build ML systems.

The topic of bias in ML systems has received significant attention recently. And rightly so. The core input to ML systems is data. And data is biased due to a variety of factors. Building a system free of bias is challenging. And in fact, the ML community has long struggled to define what a bias-free or fair system is.

Lakera Team

October 20, 2023

Last updated:

November 13, 2024

On this page

Hide table of contents

Show table of contents

The topic of bias in ML systems has received significant attention recently. And rightly so. The recent documentary Coded Bias highlighted how algorithmic decision-making leads to biased results. At worst, these can affect whole sections of the population, for instance when it comes to teacher evaluations.

The core input to ML systems is data. And data is biased due to a variety of factors – such as societal, collection, and annotation biases. People training models on such data carry the burden to ensure that the systems do not discriminate or use bias to perpetuate an unfair status quo. Building a system free of bias is challenging. And in fact, the ML community has long struggled to define what a bias-free or fair system is.

Achieving a valid definition of fairness requires a wider discussion with legal professionals and regulatory bodies. In the meantime, changing the way we build ML systems, and putting testing at the core of development, can go a long way in reducing bias in our systems.

Creating fair systems is hard – and needs participation beyond data science...

One way to approach bias is fairness. A recent push to find the right definition for algorithmic fairness focused on establishing good metrics for measuring fairness, that is, building a system with an encoded notion of it.

For example, consider a machine-learning system that predicts whether a person will pay back a bank loan. The bank cares about not discriminating between two demographics. One possible notion of fairness, “Demographic Parity”, ensures that the system has the same probability of granting a loan to both demographics. This makes intuitive sense. Another notion, “Equality of Opportunity”, would grant loans to the same portion of individuals that are likely to repay the loan in each demographic.

“Computer scientists were left to decide what a fair algorithm is, despite being ill-equipped to make such decisions.”

While these two metrics make sense, it was soon observed that they cannot be mutually satisfied. That is, if the system satisfies one of the properties, the other cannot hold. Computer scientists were left to decide what a fair algorithm is, despite being ill-equipped to make such decisions. The question of fairness has received significant attention in the last century in legal and social science publications. Stakeholders from these fields should be included in the discussion around algorithmic bias. Input from legal experts and regulators is fundamental for establishing concrete guidance that helps companies build bias-free systems.

… but a rigorous process is a good place to start.

In many applications, a rigorous testing process can go a long way in ensuring that systems are less discriminatory. This requires developing ML software as we have been developing safety-critical systems for decades. When building facial-recognition algorithms, establishing a clear definition of the scenarios in which the system is expected to work is key. Then, diligent testing around these scenarios (for example, testing on a wide range of demographics) is the first step to ensuring that the system will be fair. This process should be a core component of development, not a last-minute tweak.

“Standard testing methodologies for ML systems rely simply on validating on left-out data. This validation data may not be fully representative of the real world or of the unexpected scenarios that the system may face in production”

Careful, in-depth testing is central to building traditional software systems. Yet, standard testing methodologies for ML systems rely simply on validating on left-out data. This validation data may not be fully representative of the real world or of the unexpected scenarios that the system may face in production. If we more carefully lay out and test the explicit requirements of our machine-learning systems (e.g., equal performance among demographics), we can take a fundamental step towards building systems with fewer unwanted and unexpected behaviors.

The recent EU proposal to regulate AI systems is a big step in the right direction. The “high-risk” category proposed in the report should include all systems that can cause harm or perpetuate the status quo. We welcome this step towards a clearer set of concrete and actionable guidance on the matter – from both methodological and regulatory perspectives. Accountability by companies using ML is important, and we should aim to understand the processes that must be followed to minimize the chance of unintended consequences.

While we may not be able to build ML that is universally free of bias, we can better detect and control bias in individual systems. So, let’s make that a priority!

What do you think we could do better to avoid bias in our ML systems? We would love to chat further. Get in touch with us here or sign up for updates below!

The List of 11 Most Popular Open Source LLMs of 2023

Foundation Models Explained: Everything You Need to Know in 2023

The Ultimate Guide to LLM Fine Tuning: Best Practices & Tools

OWASP Top 10 for Large Language Model Applications Explained: A Practical Guide

Jailbreaking Large Language Models: Techniques, Examples, Prevention Methods

The Beginner’s Guide to Hallucinations in Large Language Models

A Step-by-step Guide to Prompt Engineering: Best Practices, Challenges, and Examples

What is In-context Learning, and how does it work: The Beginner’s Guide

Evaluating Large Language Models: Methods, Best Practices & Tools

Lakera Team

GenAI Security Preparedness
Report 2024

Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.

Free Download

Stress-test your models to avoid bad surprises.

Will my system work if image quality starts to drop significantly? If my system works at a given occlusion level, how much stronger can occlusion get before the system starts to underperform? I have faced such issues repeatedly in the past, all related to an overarching question: How robust is my model and when does it break?

Mateo Rojas-Carulla

November 13, 2024

min read

•

Machine Learning

Test machine learning the right way: Detecting data bugs.

In this second instance of the testing blog series, we deep dive into data bugs: what do they look like, and how can you use specification and testing to ensure you have the right data for the job?

Mateo Rojas-Carulla

November 13, 2024

Activate
untouchable mode.

Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Book a demo Start for free

Join our Slack Community.

Several people are typing about AI/ML security.  Come join us and 1000+ others in a chat that’s thoroughly SFW.

Join Lakera Momentum Slack

Free of bias? We need to change how we build ML systems.

Creating fair systems is hard – and needs participation beyond data science...

… but a rigorous process is a good place to start.

💡 Read next:

Unlock Free AI Security Guide.

Explore Prompt Injection Attacks.

Learn AI Security Basics.

Evaluate LLM Security Solutions.

Uncover LLM Vulnerabilities.

The CISO's Guide to AI Security