The topic of bias in ML systems has received significant attention recently. And rightly so. The recent documentary Coded Bias highlighted how algorithmic decision-making leads to biased results. At worst, these can affect whole sections of the population, for instance when it comes to teacher evaluations.
The core input to ML systems is data. And data is biased due to a variety of factors – such as societal, collection, and annotation biases. People training models on such data carry the burden to ensure that the systems do not discriminate or use bias to perpetuate an unfair status quo. Building a system free of bias is challenging. And in fact, the ML community has long struggled to define what a bias-free or fair system is.
Achieving a valid definition of fairness requires a wider discussion with legal professionals and regulatory bodies. In the meantime, changing the way we build ML systems, and putting testing at the core of development, can go a long way in reducing bias in our systems.
One way to approach bias is fairness. A recent push to find the right definition for algorithmic fairness focused on establishing good metrics for measuring fairness, that is, building a system with an encoded notion of it.
For example, consider a machine-learning system that predicts whether a person will pay back a bank loan. The bank cares about not discriminating between two demographics. One possible notion of fairness, “Demographic Parity”, ensures that the system has the same probability of granting a loan to both demographics. This makes intuitive sense. Another notion, “Equality of Opportunity”, would grant loans to the same portion of individuals that are likely to repay the loan in each demographic.
“Computer scientists were left to decide what a fair algorithm is, despite being ill-equipped to make such decisions.”
While these two metrics make sense, it was soon observed that they cannot be mutually satisfied. That is, if the system satisfies one of the properties, the other cannot hold. Computer scientists were left to decide what a fair algorithm is, despite being ill-equipped to make such decisions. The question of fairness has received significant attention in the last century in legal and social science publications. Stakeholders from these fields should be included in the discussion around algorithmic bias. Input from legal experts and regulators is fundamental for establishing concrete guidance that helps companies build bias-free systems.
In many applications, a rigorous testing process can go a long way in ensuring that systems are less discriminatory. This requires developing ML software as we have been developing safety-critical systems for decades. When building facial-recognition algorithms, establishing a clear definition of the scenarios in which the system is expected to work is key. Then, diligent testing around these scenarios (for example, testing on a wide range of demographics) is the first step to ensuring that the system will be fair. This process should be a core component of development, not a last-minute tweak.
“Standard testing methodologies for ML systems rely simply on validating on left-out data. This validation data may not be fully representative of the real world or of the unexpected scenarios that the system may face in production”
Careful, in-depth testing is central to building traditional software systems. Yet, standard testing methodologies for ML systems rely simply on validating on left-out data. This validation data may not be fully representative of the real world or of the unexpected scenarios that the system may face in production. If we more carefully lay out and test the explicit requirements of our machine-learning systems (e.g., equal performance among demographics), we can take a fundamental step towards building systems with fewer unwanted and unexpected behaviors.
The recent EU proposal to regulate AI systems is a big step in the right direction. The “high-risk” category proposed in the report should include all systems that can cause harm or perpetuate the status quo. We welcome this step towards a clearer set of concrete and actionable guidance on the matter – from both methodological and regulatory perspectives. Accountability by companies using ML is important, and we should aim to understand the processes that must be followed to minimize the chance of unintended consequences.
While we may not be able to build ML that is universally free of bias, we can better detect and control bias in individual systems. So, let’s make that a priority!
What do you think we could do better to avoid bias in our ML systems? We would love to chat further. Get in touch with us here or sign up for updates below!
Download this guide to delve into the most common LLM security risks and ways to mitigate them.
Subscribe to our newsletter to get the recent updates on Lakera product and other news in the AI LLM world. Be sure you’re on track!
Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.
Several people are typing about AI/ML security. Come join us and 1000+ others in a chat that’s thoroughly SFW.