Free of bias? We need to change how we build ML systems.

The topic of bias in ML systems has received significant attention recently. And rightly so. The core input to ML systems is data. And data is biased due to a variety of factors. Building a system free of bias is challenging. And in fact, the ML community has long struggled to define what a bias-free or fair system is.

Lakera Team
December 1, 2023
June 29, 2021

The topic of bias in ML systems has received significant attention recently. And rightly so. The recent documentary Coded Bias highlighted how algorithmic decision-making leads to biased results. At worst, these can affect whole sections of the population, for instance when it comes to teacher evaluations.

The core input to ML systems is data. And data is biased due to a variety of factors – such as societal, collection, and annotation biases. People training models on such data carry the burden to ensure that the systems do not discriminate or use bias to perpetuate an unfair status quo. Building a system free of bias is challenging. And in fact, the ML community has long struggled to define what a bias-free or fair system is.

Achieving a valid definition of fairness requires a wider discussion with legal professionals and regulatory bodies. In the meantime, changing the way we build ML systems, and putting testing at the core of development, can go a long way in reducing bias in our systems.

Creating fair systems is hard – and needs participation beyond data science...

One way to approach bias is fairness. A recent push to find the right definition for algorithmic fairness focused on establishing good metrics for measuring fairness, that is, building a system with an encoded notion of it.

For example, consider a machine-learning system that predicts whether a person will pay back a bank loan. The bank cares about not discriminating between two demographics. One possible notion of fairness, “Demographic Parity”, ensures that the system has the same probability of granting a loan to both demographics. This makes intuitive sense. Another notion, “Equality of Opportunity”, would grant loans to the same portion of individuals that are likely to repay the loan in each demographic.

“Computer scientists were left to decide what a fair algorithm is, despite being ill-equipped to make such decisions.”

While these two metrics make sense, it was soon observed that they cannot be mutually satisfied. That is, if the system satisfies one of the properties, the other cannot hold. Computer scientists were left to decide what a fair algorithm is, despite being ill-equipped to make such decisions. The question of fairness has received significant attention in the last century in legal and social science publications. Stakeholders from these fields should be included in the discussion around algorithmic bias. Input from legal experts and regulators is fundamental for establishing concrete guidance that helps companies build bias-free systems.

… but a rigorous process is a good place to start.

In many applications, a rigorous testing process can go a long way in ensuring that systems are less discriminatory. This requires developing ML software as we have been developing safety-critical systems for decades. When building facial-recognition algorithms, establishing a clear definition of the scenarios in which the system is expected to work is key. Then, diligent testing around these scenarios (for example, testing on a wide range of demographics) is the first step to ensuring that the system will be fair. This process should be a core component of development, not a last-minute tweak.

“Standard testing methodologies for ML systems rely simply on validating on left-out data. This validation data may not be fully representative of the real world or of the unexpected scenarios that the system may face in production”

Careful, in-depth testing is central to building traditional software systems. Yet, standard testing methodologies for ML systems rely simply on validating on left-out data. This validation data may not be fully representative of the real world or of the unexpected scenarios that the system may face in production. If we more carefully lay out and test the explicit requirements of our machine-learning systems (e.g., equal performance among demographics), we can take a fundamental step towards building systems with fewer unwanted and unexpected behaviors.

The recent EU proposal to regulate AI systems is a big step in the right direction. The “high-risk” category proposed in the report should include all systems that can cause harm or perpetuate the status quo. We welcome this step towards a clearer set of concrete and actionable guidance on the matter – from both methodological and regulatory perspectives. Accountability by companies using ML is important, and we should aim to understand the processes that must be followed to minimize the chance of unintended consequences.

While we may not be able to build ML that is universally free of bias, we can better detect and control bias in individual systems. So, let’s make that a priority!

What do you think we could do better to avoid bias in our ML systems? We would love to chat further. Get in touch with us here or sign up for updates below!

💡 Read next:

Introduction to Large Language Models: Everything You Need to Know in 2023 (+ Resources)

The List of 11 Most Popular Open Source LLMs of 2023

Foundation Models Explained: Everything You Need to Know in 2023

The Ultimate Guide to LLM Fine Tuning: Best Practices & Tools

OWASP Top 10 for Large Language Model Applications Explained: A Practical Guide

Jailbreaking Large Language Models: Techniques, Examples, Prevention Methods

The Beginner’s Guide to Hallucinations in Large Language Models

A Step-by-step Guide to Prompt Engineering: Best Practices, Challenges, and Examples

What is In-context Learning, and how does it work: The Beginner’s Guide

Evaluating Large Language Models: Methods, Best Practices & Tools

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Lakera Team
Read LLM Security Playbook
Learn about the most common LLM threats and how to prevent them.
Download
You might be interested
min read
Machine Learning

3 Strategies for Making Your ML Testing Mission-Critical.

Testing machine learning systems is currently more of an art form than a standardized engineering practice. This is particularly problematic for machine learning in mission-critical contexts. This article summarizes three steps from our ML testing series that any development team can take when testing their ML systems.
Lakera Team
December 1, 2023
min read
Machine Learning

Test machine learning the right way: Detecting data bugs.

In this second instance of the testing blog series, we deep dive into data bugs: what do they look like, and how can you use specification and testing to ensure you have the right data for the job?
Mateo Rojas-Carulla
December 1, 2023
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.