MLOps Pipeline
User Guide
How to use the pipeline as an MLE who wants to train a model leveraging the pipeline's amenities in terms of visualization and reduced overhead can be found in the User Guide.
Developer Guide
The Developer Guide enlightens any MLOps engineer who wants to set up the pipeline on his own.
MLOps Resources
Here is a list of introductory materials on MLOps that will help you get started on the right foot. Feel free to explore these resources to gain valuable insights into MLOps. They cover various aspects of Machine Learning Operations, from data pipelines and model development to deployment in production.
Books with Code Samples
Data Pipelines With Apache Airflow (Authors: Bas P. Harenslak, Julian Rutger De Ruiter)
- GitHub Repository: data-pipelines-with-apache-airflow
Practical Deep Learning at Scale with MLflow (Author: Yong Liu)
- GitHub Repository: Practical-Deep-Learning-at-Scale-with-MLFlow
Designing Machine Learning Systems (Author: Chip Huyen)
- GitHub Repository: machine-learning-systems-design
Tutorials and Courses
Made With ML by Goku Mohandas
- Website: madewithml.com
- GitHub Repository: Made-With-ML
Stanford's ML Systems Design Course
- Syllabus: Stanford CS329S
- Based on Chip Huyen's Designing Machine Learning Systems and using Goku Mohandas' Made With ML
Deploying Machine Learning Models in Production (Coursera)
- Course Link: coursera.org / DeepLearning.AI
Full Stack Deep Learning
- GitHub Repository: Interactive Colab Notebook Lectures
Goals
The pipeline was originally designed for the certAInty project but is meant to be general enough to be used for other projects as well.
Robustness: Data should persist even if the pipeline fails.
Reproducibility: The pipeline should be reproducible.
Scalability: The pipeline should be able to scale to multiple machines.
Flexibility: The pipeline should be able to run on different machines.
Monitoring: The pipeline should be able to monitor itself.
Logging: The pipeline should be able to log itself.
Parity: The pipeline should be able to run in production.
Visualization: The pipeline should be able to visualize itself.