We now live in a world where AI (artificial intelligence) is used in mission-critical systems yet still developed like consumer technology.
As ML (machine learning) engineers, all we want is to build our systems to work without harm to others or getting stuck on a merry-go-round of prototyping. So, how can we get there?
We can take inspiration from traditional software engineering practices!
Why? Well, do you remember the last time your online stock broker purchased the wrong shares? Or Twitter failed to retrieve your latest tweets? I don’t. Major software malfunctioning is so unexpected that when it happens – it makes headline news.
As C. A. R. Hoare’s classic 1996 article “How did software get so reliable without proof?” points out, the answer is likely around rigorous development processes, continuous improvement of existing software, and extensive testing.
Traditional software goes through well-defined testing and release processes.
Software engineers are more than familiar with concepts such as:
– Test-driven development
– Unit tests
– Regression tests
– Integration tests
Tests are a part of CI/CD (continuous integration/continuous development) pipelines. Engineers don’t merge code unless all tests have passed. By the time they go to production, they are confident that the software works as expected. They follow a “test-to-ship” strategy.
Let’s take a look at how we typically develop ML systems today. It’s common practice to split our dataset into training, validation, and testing subsets. The first two become part of the model training loop, whereas the testing subset is used separately, outside of the training loop, to assess performance on unseen data. A typical evaluation strategy would include calculating various metrics over these data subsets and using them as an indication of real-world system performance.
It turns out that this strategy is often insufficient. Many teams find that their ML systems end up performing ‘well enough’ on their carefully selected datasets but are too brittle to be used in the real world.
At the same time, creating more complete quantitative testing and release processes is often seen as too time-consuming, especially within smaller teams. We have observed many who instead spend a lot of time on qualitative testing – which tends to fall short of constructing a thorough understanding of performance. As a result, computer vision development follows a “ship-to-test’” strategy.
The fact that ML systems are only really tested during operation has obvious and major implications. These systems tend to operate with significant risk as vulnerabilities only tend to surface during operation: pedestrians are not detected at night, COVID diagnostics are fundamentally flawed, or systems exhibit undesired biases. At best, this leads to low customer satisfaction or products that never make it in the market, and at worst, it puts people and society at risk.
The good news is that we can bring some of the concepts from traditional software development to ML development. We need to ensure that vulnerabilities are found during development. So, we need to bring back ‘test-to-ship’.
Putting systematic testing at the core of your development processes is a great way to build better AI products faster. Our ML testing series provides a few simple strategies that any development team can use to prevent failure during operation.
Lakera’s MLTest equips every computer vision development team with a world-class testing infrastructure. Our product finds critical vulnerabilities and flaws in computer vision systems–automatically as part of existing development processes and before they can impact operation. We want to enable every team, small to large, to ship AI products quickly and reliably. Get in touch to schedule a demo!
Download this guide to delve into the most common LLM security risks and ways to mitigate them.
Subscribe to our newsletter to get the recent updates on Lakera product and other news in the AI LLM world. Be sure you’re on track!
Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.
Several people are typing about AI/ML security. Come join us and 1000+ others in a chat that’s thoroughly SFW.