With demand for scaling, real-time access, and other capabilities, businesses need to consider building operational machine learning pipelines. This practical guide helps your company bring data science to life for different real-world MLOps scenarios. Senior data scientists, MLOps engineers, and machine learning engineers will learn how to tackle challenges that prevent many businesses from moving ML models to production.
Authors Yaron Haviv and Noah Gift take a production-first approach. Rather than beginning with the ML model, you'll learn how to design a continuous operational pipeline, while making sure that various components and practices can map into it. By automating as many components as possible, and making the process fast and repeatable, your pipeline can scale to match your organization's needs.
You'll learn how to provide rapid business value while answering dynamic MLOps requirements. This book will help you:
Learn the MLOps process, including...
Author(s): Yaron Haviv
Publisher: O'Reilly Media
Year: 2023
Language: English
Pages: 377
Preface
Who This Book Is For
Navigating This Book
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Yaron
Noah
1. MLOps: What Is It and Why Do We Need It?
What Is MLOps?
MLOps in the Enterprise
Understanding ROI in Enterprise Solutions
Understanding Risk and Uncertainty in the Enterprise
MLOps Versus DevOps
What Isn’t MLOps?
Mainstream Definitions of MLOps
What Is ML Engineering?
MLOps and Business Incentives
MLOps in the Cloud
Key Cloud Development Environments
The Key Players in Cloud Computing
AWS view of cloud computing as it relates to MLOps
Azure view of cloud computing as it relates to MLOps
GCP view of cloud computing as it relates to MLOps
MLOps On-Premises
MLOps in Hybrid Environments
Enterprise MLOps Strategy
Conclusion
Critical Thinking Discussion Questions
Exercises
2. The Stages of MLOps
Getting Started
Choose Your Algorithm
Design Your Pipelines
Data Collection and Preparation
Data Storage and Ingestion
Data Exploration and Preparation
Data Labeling
Feature Stores
Model Development and Training
Writing and Maintaining Production ML Code
Tracking and Comparing Experiment Results
Distributed Training and Hyperparameter Optimization
Building and Testing Models for Production
Deployment (and Online ML Services)
From Model Endpoints to Application Pipelines
Online Data Preparation
Continuous Model and Data Monitoring
Monitoring Data and Concept Drift
Monitoring Model Performance and Accuracy
The Strategy of Pretrained Models
Building an End-to-End Hugging Face Application
Flow Automation (CI/CD for ML)
Conclusion
Critical Thinking Discussion Questions
Exercises
3. Getting Started with Your First MLOps Project
Identifying the Business Use Case and Goals
Finding the AI Use Case
Defining Goals and Evaluating the ROI
How to Build a Successful ML Project
Approving and Prototyping the Project
Scaling and Productizing Projects
Project Structure and Lifecycle
ML Project Example from A to Z
Exploratory Data Analysis
Data and Model Pipeline Development
Application Pipeline Development
Real-time application pipelines
Batch application pipelines
Scaling and Productizing the Project
Adding tests
ML pipelines and hyperparameter optimization
CI/CD and Continuous Operations
Continuously monitoring data and models
Integrating with a CI/CD service
Conclusion
Critical Thinking Discussion Questions
Exercises
4. Working with Data and Feature Stores
Data Versioning and Lineage
How It Works
Common ML Data Versioning Tools
Data Version Control
Pachyderm
MLflow Tracking
MLRun
Other Frameworks
Data Preparation and Analysis at Scale
Structured and Unstructured Data Transformations
Distributed Data Processing Architectures
Interactive Data Processing
Batch Data Processing
Stream Processing
Stream Processing Frameworks
Feature Stores
Feature Store Architecture and Usage
Ingestion and Transformation Service
Feature Storage
Feature Retrieval (for Training and Serving)
Feature Stores Solutions and Usage Example
Using Feast Feature Store
Using MLRun Feature Store
Conclusion
Critical Thinking Discussion Questions
Exercises
5. Developing Models for Production
AutoML
Running, Tracking, and Comparing ML Jobs
Experiment Tracking
Saving Essential Metadata with the Model Artifacts
Comparing ML Jobs: An Example with MLflow
Hyperparameter Tuning
Auto-Logging
MLOps Automation: AutoMLOps
Example: Running and Tracking ML Jobs Using Azure Databricks
Handling Training at Scale
Building and Running Multi-Stage Workflows
Managing Computation Resources Efficiently
Conclusion
Critical Thinking Discussion Questions
Exercises
6. Deployment of Models and AI Applications
Model Registry and Management
Solution Examples
SageMaker Example
MLflow Example
MLRun Example
Model Serving
Amazon SageMaker
Seldon Core
MLRun Serving
Advanced Serving and Application Pipelines
Implementing Scalable Application Pipelines
AWS Step Functions
Apache Beam
MLRun serving graphs
Model Routing and Ensembles
Model Optimization and ONNX
Data and Model Monitoring
Integrated Model Monitoring Solutions
Amazon SageMaker
Google Vertex AI
MLRun
Standalone Model Monitoring Solutions
Model Retraining
When to Retrain Your Models
Strategies for Data Retraining
Model Retraining in the MLOps Pipeline
Deployment Strategies
Measuring the Business Impact
Conclusion
Critical Thinking Discussion Questions
Exercises
7. Building a Production Grade MLOps Project from A to Z
Exploratory Data Analysis
Interactive Data Preparation
Preparing the Credit Transaction Dataset
Preparing the User Events (Activities) Dataset
Extracting Labels and Training a Model
Data Ingestion and Preparation Using a Feature Store
Building the Credit Transactions Data Pipeline (Feature Set)
Building the User Events Data Pipeline (FeatureSet)
Building the Target Labels Data Pipeline (FeatureSet)
Ingesting Data into the Feature Store
Batch data ingestion (for tests and training)
Real-time data ingestion (for production)
Model Training and Validation Pipeline
Creating and Evaluating a Feature Vector
Building and Running an Automated Training and Validation Pipeline
Real-Time Application Pipeline
Defining a Custom Model Serving Class
Building an Application Pipeline with Enrichment and Ensemble
Testing the Application Pipeline Locally
Deploying and Testing the Real-Time Application Pipeline
Model Monitoring
CI/CD and Continuous Operations
Conclusion
Critical Thinking Discussion Questions
Exercises
8. Building Scalable Deep Learning and Large Language Model Projects
Distributed Deep Learning
Horovod
Ray
Data Gathering, Labeling, and Monitoring in DL
Data Labeling Pitfalls to Avoid
Data Labeling Best Practices
Data Labeling Solutions
Using Foundation Models as Labelers
Monitoring DL Models with Unstructured Data
Build Versus Buy Deep Learning Models
Foundation Models, Generative AI, LLMs
Risks and Challenges with Generative AI
MLOps Pipelines for Efficiently Using and Customizing LLMs
Application Example: Fine-Tuning an LLM Model
Data preparation and tuning
Data preparation
Model tuning and evaluation
Defining an MLRun project and running the pipeline
Application and model serving pipeline
Adding a web interface
Conclusion
Critical Thinking Discussion Questions
Exercises
9. Solutions for Advanced Data Types
ML Problem Framing with Time Series
Navigating Time Series Analysis with AWS
Diving into Time Series with DeepAR+
Time Series with the GCP BigQuery and SQL
Build Versus Buy for MLOps NLP Problems
Build Versus Buy: The Hugging Face Approach
Exploring Natural Language Processing with AWS
Exploring NLP with OpenAI
Video Analysis, Image Classification, and Generative AI
Image Classification Techniques with CreateML
Composite AI
Getting Started with Serverless for Composite AI
Use Cases of Composite AI with Serverless
Conclusion
Critical Thinking Discussion Questions
Exercises
10. Implementing MLOps Using Rust
The Case for Rust for MLOps
Leveling Up with Rust, GitHub Copilot, and Codespaces
In the Beginning Was the Command Line
Getting Started with Rust for MLOps
Using PyTorch and Hugging Face with Rust
Using Rust to Build Tools for MLOps
Building Containerized Rust Command-Line Tools
GPU PyTorch Workflows
Using TensorFlow Rust
Doing k-means Clustering with Rust
Final Notes on Rust
Ruff Linter
rust-new-project-template
Conclusion
Critical Thinking Discussion Questions
Exercises
A. Job Interview Questions
What is the primary purpose of DevOps?
What is an excellent example of the fundamental processes necessary to implement MLOps?
What is a feature store?
What is a Model Registry?
What are the best practices for operationalizing a microservice?
What is GitHub Actions, and what are the primary use cases?
What is a data pipeline?
What are the primary use cases for Jupyter Notebook?
What is the purpose of linting Python code?
Why are cloud-based development environments like GitHub Codespaces and AWS Cloud9 useful?
What is Big O Notation?
What are business use cases for the mathematical field of optimization?
What is the traveling salesman problem?
Describe how the gradient descent algorithm works?
The greedy coin problem is what type of programming problem?
What are the advantages of containers?
What is an HTTP API?
What are the advantages of containerized ML applications?
What are the advantages of using ONNX for model interoperability?
What are the use cases for edge-based machine learning models?
What is a Spark Cluster?
What problems does PySpark solve?
What are the critical components of the Databricks platform?
What are the critical components of MLflow?
What is the critical difference between a Spark DataFrame and a pandas DataFrame?
What is kaizen?
What is a data warehouse?
What is a scheduled data pipeline job?
What is data engineering?
What is DataOps?
What is Kubernetes?
Why are microservices a good fit for Kubernetes?
What is observability in software engineering?
What are the critical components of Kubernetes?
What is the Kubernetes API?
What are the core components of a cloud native architecture?
What are three cloud native data components?
What are the common fallacies of distributed computing?
How do you access network storage in Docker?
What is block storage?
B. Enterprise MLOps Interviews
Shubham Saboo and Sandra Kublik
Piero Molino
Asaf Somekh
Javier Luraschi and Pedro Luraschi
Malcolm Smith Fraser
Jon Reifschneider
Julien Simon
Shubham Saboo
Brian Ray
Simon Stebelena
Bindu Reddy
Dhanasekar Sundararaman
Ville Tuulos
Lewis Tunstall and Leandro von Werra
Arvs Lat
Julien Simon, Yaron Haviv, and Noah Gift
Nic Stone
Doris Xin
Index