VOCABULARY

Machine Learning Workflow

A machine learning workflow is a systematic, repeatable sequence of processes in which machine learning algorithms are deployed for developing predictive models and applications. This workflow includes a series of steps that are essential for building, training, and deploying machine learning models.

The purpose of a machine learning workflow is to enable data scientists, data engineers, and other stakeholders to work together efficiently towards model development and application.

Machine Learning Workflows in practice

Problem Understanding: The first step of any machine learning workflow involves identifying the problem to solve or the question to answer. This includes defining objectives and desired outcomes.
Data Gathering: The next step is to select the data required for the machine learning model. The data can come from various sources including databases, data warehouses, or even real-time data streams.
Data Preparation: The collected data is then preprocessed to be used in a machine learning model. It involves cleaning, handling missing values, handling outliers, and sometimes involves feature engineering and feature selection.
Model Building: Various machine learning algorithms are chosen and applied to the prepared data set to create a predictive model. Hyperparameters of the model may need to be tuned.
Model Evaluation: Once the model is built, it needs to be evaluated to determine its performance and accuracy. This involves using various metrics and techniques like confusion matrix, ROC curve, cross-validation, etc.
Model Deployment: If the model's performance is satisfactory, it is deployed to a test environment, where it's integrated with the business processes and existing systems to see if it works as expected.
Monitoring and Updating: Once the model is live, it requires regular monitoring to ensure it's performing as expected, and it may require updating as new data becomes available or as the business needs change.

This workflow is not strictly linear. It's often iterative, with frequent returns to earlier steps for adjustments as new insights are gained or as data or circumstances change.