Inference is the process by which a trained machine learning model applies its learnt patterns and knowledge to new and unseen data. In other words, it is the application phase where the model uses its training to predict outcomes, classify data, or make decisions based on inputs it has not previously seen.
How Inference works
The inference process begins after the machine learning model has been trained, validated, and optimized with a training dataset.
Once the model is trained and fine-tuned, it is then ready for inference. At this stage, new and unseen data is inputted into the model which then makes predictions or classifications based on what it has learned during training. For example, it might receive a new email and classify it as spam or non-spam.
It is important to note that inference needs to be performed under real-time constraints and resource limitations in production environments. As a result, optimizing for speed and efficiency is crucial for machine learning inference. This might involve techniques like model pruning, quantization, or hardware acceleration.
In summary, machine learning inference is about applying a trained model to new data in the real world, making it a critical aspect of practical machine learning applications.
Download this guide to delve into the most common LLM security risks and ways to mitigate them.
Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.
Several people are typing about AI/ML security. Come join us and 1000+ others in a chat that’s thoroughly SFW.