Implementing MLOps in the Enterprise: A Production-First Approach

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

With demand for scaling, real-time access, and other capabilities, businesses need to consider building operational machine learning pipelines. This practical guide helps your company bring data science to life for different real-world MLOps scenarios. Senior data scientists, MLOps engineers, and machine learning engineers will learn how to tackle challenges that prevent many businesses from moving ML models to production.

Authors Yaron Haviv and Noah Gift take a production-first approach. Rather than beginning with the ML model, you'll learn how to design a continuous operational pipeline, while making sure that various components and practices can map into it. By automating as many components as possible, and making the process fast and repeatable, your pipeline can scale to match your organization's needs.

You'll learn how to provide rapid business value while answering dynamic MLOps requirements. This book will help you:

  • Learn the MLOps process, including...
  • Author(s): Yaron Haviv
    Publisher: O'Reilly Media
    Year: 2023

    Language: English
    Pages: 377

    Preface
    Who This Book Is For
    Navigating This Book
    Conventions Used in This Book
    Using Code Examples
    O’Reilly Online Learning
    How to Contact Us
    Acknowledgments
    Yaron
    Noah
    1. MLOps: What Is It and Why Do We Need It?
    What Is MLOps?
    MLOps in the Enterprise
    Understanding ROI in Enterprise Solutions
    Understanding Risk and Uncertainty in the Enterprise
    MLOps Versus DevOps
    What Isn’t MLOps?
    Mainstream Definitions of MLOps
    What Is ML Engineering?
    MLOps and Business Incentives
    MLOps in the Cloud
    Key Cloud Development Environments
    The Key Players in Cloud Computing
    AWS view of cloud computing as it relates to MLOps
    Azure view of cloud computing as it relates to MLOps
    GCP view of cloud computing as it relates to MLOps
    MLOps On-Premises
    MLOps in Hybrid Environments
    Enterprise MLOps Strategy
    Conclusion
    Critical Thinking Discussion Questions
    Exercises
    2. The Stages of MLOps
    Getting Started
    Choose Your Algorithm
    Design Your Pipelines
    Data Collection and Preparation
    Data Storage and Ingestion
    Data Exploration and Preparation
    Data Labeling
    Feature Stores
    Model Development and Training
    Writing and Maintaining Production ML Code
    Tracking and Comparing Experiment Results
    Distributed Training and Hyperparameter Optimization
    Building and Testing Models for Production
    Deployment (and Online ML Services)
    From Model Endpoints to Application Pipelines
    Online Data Preparation
    Continuous Model and Data Monitoring
    Monitoring Data and Concept Drift
    Monitoring Model Performance and Accuracy
    The Strategy of Pretrained Models
    Building an End-to-End Hugging Face Application
    Flow Automation (CI/CD for ML)
    Conclusion
    Critical Thinking Discussion Questions
    Exercises
    3. Getting Started with Your First MLOps Project
    Identifying the Business Use Case and Goals
    Finding the AI Use Case
    Defining Goals and Evaluating the ROI
    How to Build a Successful ML Project
    Approving and Prototyping the Project
    Scaling and Productizing Projects
    Project Structure and Lifecycle
    ML Project Example from A to Z
    Exploratory Data Analysis
    Data and Model Pipeline Development
    Application Pipeline Development
    Real-time application pipelines
    Batch application pipelines
    Scaling and Productizing the Project
    Adding tests
    ML pipelines and hyperparameter optimization
    CI/CD and Continuous Operations
    Continuously monitoring data and models
    Integrating with a CI/CD service
    Conclusion
    Critical Thinking Discussion Questions
    Exercises
    4. Working with Data and Feature Stores
    Data Versioning and Lineage
    How It Works
    Common ML Data Versioning Tools
    Data Version Control
    Pachyderm
    MLflow Tracking
    MLRun
    Other Frameworks
    Data Preparation and Analysis at Scale
    Structured and Unstructured Data Transformations
    Distributed Data Processing Architectures
    Interactive Data Processing
    Batch Data Processing
    Stream Processing
    Stream Processing Frameworks
    Feature Stores
    Feature Store Architecture and Usage
    Ingestion and Transformation Service
    Feature Storage
    Feature Retrieval (for Training and Serving)
    Feature Stores Solutions and Usage Example
    Using Feast Feature Store
    Using MLRun Feature Store
    Conclusion
    Critical Thinking Discussion Questions
    Exercises
    5. Developing Models for Production
    AutoML
    Running, Tracking, and Comparing ML Jobs
    Experiment Tracking
    Saving Essential Metadata with the Model Artifacts
    Comparing ML Jobs: An Example with MLflow
    Hyperparameter Tuning
    Auto-Logging
    MLOps Automation: AutoMLOps
    Example: Running and Tracking ML Jobs Using Azure Databricks
    Handling Training at Scale
    Building and Running Multi-Stage Workflows
    Managing Computation Resources Efficiently
    Conclusion
    Critical Thinking Discussion Questions
    Exercises
    6. Deployment of Models and AI Applications
    Model Registry and Management
    Solution Examples
    SageMaker Example
    MLflow Example
    MLRun Example
    Model Serving
    Amazon SageMaker
    Seldon Core
    MLRun Serving
    Advanced Serving and Application Pipelines
    Implementing Scalable Application Pipelines
    AWS Step Functions
    Apache Beam
    MLRun serving graphs
    Model Routing and Ensembles
    Model Optimization and ONNX
    Data and Model Monitoring
    Integrated Model Monitoring Solutions
    Amazon SageMaker
    Google Vertex AI
    MLRun
    Standalone Model Monitoring Solutions
    Model Retraining
    When to Retrain Your Models
    Strategies for Data Retraining
    Model Retraining in the MLOps Pipeline
    Deployment Strategies
    Measuring the Business Impact
    Conclusion
    Critical Thinking Discussion Questions
    Exercises
    7. Building a Production Grade MLOps Project from A to Z
    Exploratory Data Analysis
    Interactive Data Preparation
    Preparing the Credit Transaction Dataset
    Preparing the User Events (Activities) Dataset
    Extracting Labels and Training a Model
    Data Ingestion and Preparation Using a Feature Store
    Building the Credit Transactions Data Pipeline (Feature Set)
    Building the User Events Data Pipeline (FeatureSet)
    Building the Target Labels Data Pipeline (FeatureSet)
    Ingesting Data into the Feature Store
    Batch data ingestion (for tests and training)
    Real-time data ingestion (for production)
    Model Training and Validation Pipeline
    Creating and Evaluating a Feature Vector
    Building and Running an Automated Training and Validation Pipeline
    Real-Time Application Pipeline
    Defining a Custom Model Serving Class
    Building an Application Pipeline with Enrichment and Ensemble
    Testing the Application Pipeline Locally
    Deploying and Testing the Real-Time Application Pipeline
    Model Monitoring
    CI/CD and Continuous Operations
    Conclusion
    Critical Thinking Discussion Questions
    Exercises
    8. Building Scalable Deep Learning and Large Language Model Projects
    Distributed Deep Learning
    Horovod
    Ray
    Data Gathering, Labeling, and Monitoring in DL
    Data Labeling Pitfalls to Avoid
    Data Labeling Best Practices
    Data Labeling Solutions
    Using Foundation Models as Labelers
    Monitoring DL Models with Unstructured Data
    Build Versus Buy Deep Learning Models
    Foundation Models, Generative AI, LLMs
    Risks and Challenges with Generative AI
    MLOps Pipelines for Efficiently Using and Customizing LLMs
    Application Example: Fine-Tuning an LLM Model
    Data preparation and tuning
    Data preparation
    Model tuning and evaluation
    Defining an MLRun project and running the pipeline
    Application and model serving pipeline
    Adding a web interface
    Conclusion
    Critical Thinking Discussion Questions
    Exercises
    9. Solutions for Advanced Data Types
    ML Problem Framing with Time Series
    Navigating Time Series Analysis with AWS
    Diving into Time Series with DeepAR+
    Time Series with the GCP BigQuery and SQL
    Build Versus Buy for MLOps NLP Problems
    Build Versus Buy: The Hugging Face Approach
    Exploring Natural Language Processing with AWS
    Exploring NLP with OpenAI
    Video Analysis, Image Classification, and Generative AI
    Image Classification Techniques with CreateML
    Composite AI
    Getting Started with Serverless for Composite AI
    Use Cases of Composite AI with Serverless
    Conclusion
    Critical Thinking Discussion Questions
    Exercises
    10. Implementing MLOps Using Rust
    The Case for Rust for MLOps
    Leveling Up with Rust, GitHub Copilot, and Codespaces
    In the Beginning Was the Command Line
    Getting Started with Rust for MLOps
    Using PyTorch and Hugging Face with Rust
    Using Rust to Build Tools for MLOps
    Building Containerized Rust Command-Line Tools
    GPU PyTorch Workflows
    Using TensorFlow Rust
    Doing k-means Clustering with Rust
    Final Notes on Rust
    Ruff Linter
    rust-new-project-template
    Conclusion
    Critical Thinking Discussion Questions
    Exercises
    A. Job Interview Questions
    What is the primary purpose of DevOps?
    What is an excellent example of the fundamental processes necessary to implement MLOps?
    What is a feature store?
    What is a Model Registry?
    What are the best practices for operationalizing a microservice?
    What is GitHub Actions, and what are the primary use cases?
    What is a data pipeline?
    What are the primary use cases for Jupyter Notebook?
    What is the purpose of linting Python code?
    Why are cloud-based development environments like GitHub Codespaces and AWS Cloud9 useful?
    What is Big O Notation?
    What are business use cases for the mathematical field of optimization?
    What is the traveling salesman problem?
    Describe how the gradient descent algorithm works?
    The greedy coin problem is what type of programming problem?
    What are the advantages of containers?
    What is an HTTP API?
    What are the advantages of containerized ML applications?
    What are the advantages of using ONNX for model interoperability?
    What are the use cases for edge-based machine learning models?
    What is a Spark Cluster?
    What problems does PySpark solve?
    What are the critical components of the Databricks platform?
    What are the critical components of MLflow?
    What is the critical difference between a Spark DataFrame and a pandas DataFrame?
    What is kaizen?
    What is a data warehouse?
    What is a scheduled data pipeline job?
    What is data engineering?
    What is DataOps?
    What is Kubernetes?
    Why are microservices a good fit for Kubernetes?
    What is observability in software engineering?
    What are the critical components of Kubernetes?
    What is the Kubernetes API?
    What are the core components of a cloud native architecture?
    What are three cloud native data components?
    What are the common fallacies of distributed computing?
    How do you access network storage in Docker?
    What is block storage?
    B. Enterprise MLOps Interviews
    Shubham Saboo and Sandra Kublik
    Piero Molino
    Asaf Somekh
    Javier Luraschi and Pedro Luraschi
    Malcolm Smith Fraser
    Jon Reifschneider
    Julien Simon
    Shubham Saboo
    Brian Ray
    Simon Stebelena
    Bindu Reddy
    Dhanasekar Sundararaman
    Ville Tuulos
    Lewis Tunstall and Leandro von Werra
    Arvs Lat
    Julien Simon, Yaron Haviv, and Noah Gift
    Nic Stone
    Doris Xin
    Index