The topic of bias in ML systems has received significant attention recently. And rightly so. The recent documentary Coded Bias highlighted how algorithmic decision-making leads to biased results. At worst, these can affect whole sections of the population, for instance when it comes to teacher evaluations.
The core input to ML systems is data. And data is biased due to a variety of factors – such as societal, collection, and annotation biases. People training models on such data carry the burden to ensure that the systems do not discriminate or use bias to perpetuate an unfair status quo. Building a system free of bias is challenging. And in fact, the ML community has long struggled to define what a bias-free or fair system is.
Achieving a valid definition of fairness requires a wider discussion with legal professionals and regulatory bodies. In the meantime, changing the way we build ML systems, and putting testing at the core of development, can go a long way in reducing bias in our systems.
Creating fair systems is hard – and needs participation beyond data science...
One way to approach bias is fairness. A recent push to find the right definition for algorithmic fairness focused on establishing good metrics for measuring fairness, that is, building a system with an encoded notion of it.
For example, consider a machine-learning system that predicts whether a person will pay back a bank loan. The bank cares about not discriminating between two demographics. One possible notion of fairness, “Demographic Parity”, ensures that the system has the same probability of granting a loan to both demographics. This makes intuitive sense. Another notion, “Equality of Opportunity”, would grant loans to the same portion of individuals that are likely to repay the loan in each demographic.
“Computer scientists were left to decide what a fair algorithm is, despite being ill-equipped to make such decisions.”
While these two metrics make sense, it was soon observed that they cannot be mutually satisfied. That is, if the system satisfies one of the properties, the other cannot hold. Computer scientists were left to decide what a fair algorithm is, despite being ill-equipped to make such decisions. The question of fairness has received significant attention in the last century in legal and social science publications. Stakeholders from these fields should be included in the discussion around algorithmic bias. Input from legal experts and regulators is fundamental for establishing concrete guidance that helps companies build bias-free systems.
… but a rigorous process is a good place to start.
In many applications, a rigorous testing process can go a long way in ensuring that systems are less discriminatory. This requires developing ML software as we have been developing safety-critical systems for decades. When building facial-recognition algorithms, establishing a clear definition of the scenarios in which the system is expected to work is key. Then, diligent testing around these scenarios (for example, testing on a wide range of demographics) is the first step to ensuring that the system will be fair. This process should be a core component of development, not a last-minute tweak.
“Standard testing methodologies for ML systems rely simply on validating on left-out data. This validation data may not be fully representative of the real world or of the unexpected scenarios that the system may face in production”
Careful, in-depth testing is central to building traditional software systems. Yet, standard testing methodologies for ML systems rely simply on validating on left-out data. This validation data may not be fully representative of the real world or of the unexpected scenarios that the system may face in production. If we more carefully lay out and test the explicit requirements of our machine-learning systems (e.g., equal performance among demographics), we can take a fundamental step towards building systems with fewer unwanted and unexpected behaviors.
The recent EU proposal to regulate AI systems is a big step in the right direction. The “high-risk” category proposed in the report should include all systems that can cause harm or perpetuate the status quo. We welcome this step towards a clearer set of concrete and actionable guidance on the matter – from both methodological and regulatory perspectives. Accountability by companies using ML is important, and we should aim to understand the processes that must be followed to minimize the chance of unintended consequences.
While we may not be able to build ML that is universally free of bias, we can better detect and control bias in individual systems. So, let’s make that a priority!
What do you think we could do better to avoid bias in our ML systems? We would love to chat further. Get in touch with us here or sign up for updates below!
💡 Read next:
Introduction to Large Language Models: Everything You Need to Know in 2023 (+ Resources)
The List of 11 Most Popular Open Source LLMs of 2023
Foundation Models Explained: Everything You Need to Know in 2023
The Ultimate Guide to LLM Fine Tuning: Best Practices & Tools
OWASP Top 10 for Large Language Model Applications Explained: A Practical Guide
Jailbreaking Large Language Models: Techniques, Examples, Prevention Methods
The Beginner’s Guide to Hallucinations in Large Language Models
A Step-by-step Guide to Prompt Engineering: Best Practices, Challenges, and Examples
What is In-context Learning, and how does it work: The Beginner’s Guide
Evaluating Large Language Models: Methods, Best Practices & Tools