Transform your machine learning projects into successful deployments with this practical guide on how to build and scale solutions that solve real-world problems
Includes a new chapter on generative AI and large language models (LLMs) and building a pipeline that leverages LLMs using LangChain
Key Features
This second edition delves deeper into key machine learning topics, CI/CD, and system design
Explore core MLOps practices, such as model management and performance monitoring
Build end-to-end examples of deployable ML microservices and pipelines using AWS and open-source tools
Book Description
The Second Edition of Machine Learning Engineering with Python is the practical guide that MLOps and ML engineers need to build solutions to real-world problems. It will provide you with the skills you need to stay ahead in this rapidly evolving field.
The book takes an examples-based approach to help you develop your skills and covers the technical concepts, implementation patterns, and development methodologies you need. You'll explore the key steps of the ML development lifecycle and create your own standardized "model factory" for training and retraining of models. You'll learn to employ concepts like CI/CD and how to detect different types of drift.
Get hands-on with the latest in deployment architectures and discover methods for scaling up your solutions. This edition goes deeper in all aspects of ML engineering and MLOps, with emphasis on the latest open-source and cloud-based technologies. This includes a completely revamped approach to advanced pipelining and orchestration techniques.
With a new chapter on deep learning, generative AI, and LLMOps, you will learn to use tools like LangChain, PyTorch, and Hugging Face to leverage LLMs for supercharged analysis. You will explore AI assistants like GitHub Copilot to become more productive, then dive deep into the engineering considerations of working with deep learning.
What you will learn
Plan and manage end-to-end ML development projects
Explore deep learning, LLMs, and LLMOps to leverage generative AI
Use Python to package your ML tools and scale up your solutions
Get to grips with Apache Spark, Kubernetes, and Ray
Build and run ML pipelines with Apache Airflow, ZenML, and Kubeflow
Detect drift and build retraining mechanisms into your solutions
Improve error handling with control flows and vulnerability scanning
Host and build ML microservices and batch processes running on AWS
Who this book is for
This book is designed for MLOps and ML engineers, data scientists, and software developers who want to build robust solutions that use machine learning to solve real-world problems. If you’re not a developer but want to manage or understand the product lifecycle of these systems, you’ll also find this book useful. It assumes a basic knowledge of machine learning concepts and intermediate programming experience in Python. With its focus on practical skills and real-world examples, this book is an essential resource for anyone looking to advance their machine learning engineering career.
Author(s): Andrew P. McMahon
Edition: Second Edition
Publisher: Packt
Year: 2023
Language: English
Pages: 601
Preface
Who this book is for
What this book covers
To get the most out of this book
Get in touch
Introduction to ML Engineering
Technical requirements
Defining a taxonomy of data disciplines
Data scientist
ML engineer
ML operations engineer
Data engineer
Working as an effective team
ML engineering in the real world
What does an ML solution look like?
Why Python?
High-level ML system design
Example 1: Batch anomaly detection service
Example 2: Forecasting API
Example 3: Classification pipeline
Summary
The Machine Learning Development Process
Technical requirements
Setting up our tools
Setting up an AWS account
Concept to solution in four steps
Comparing this to CRISP-DM
Discover
Using user stories
Play
Develop
Selecting a software development methodology
Package management (conda and pip)
Poetry
Code version control
Git strategies
Model version control
Deploy
Knowing your deployment options
Understanding DevOps and MLOps
Building our first CI/CD example with GitHub Actions
Continuous model performance testing
Continuous model training
Summary
From Model to Model Factory
Technical requirements
Defining the model factory
Learning about learning
Defining the target
Cutting your losses
Preparing the data
Engineering features for machine learning
Engineering categorical features
Engineering numerical features
Designing your training system
Training system design options
Train-run
Train-persist
Retraining required
Detecting data drift
Detecting concept drift
Setting the limits
Diagnosing the drift
Remediating the drift
Other tools for monitoring
Automating training
Hierarchies of automation
Optimizing hyperparameters
Hyperopt
Optuna
AutoML
auto-sklearn
AutoKeras
Persisting your models
Building the model factory with pipelines
Scikit-learn pipelines
Spark ML pipelines
Summary
Packaging Up
Technical requirements
Writing good Python
Recapping the basics
Tips and tricks
Adhering to standards
Writing good PySpark
Choosing a style
Object-oriented programming
Functional programming
Packaging your code
Why package?
Selecting use cases for packaging
Designing your package
Building your package
Managing your environment with Makefiles
Getting all poetic with Poetry
Testing, logging, securing, and error handling
Testing
Securing your solutions
Analyzing your own code for security issues
Analyzing dependencies for security issues
Logging
Error handling
Not reinventing the wheel
Summary
Deployment Patterns and Tools
Technical requirements
Architecting systems
Building with principles
Exploring some standard ML patterns
Swimming in data lakes
Microservices
Event-based designs
Batching
Containerizing
Hosting your own microservice on AWS
Pushing to ECR
Hosting on ECS
Building general pipelines with Airflow
Airflow
Airflow on AWS
Revisiting CI/CD for Airflow
Building advanced ML pipelines
Finding your ZenML
Going with the Kubeflow
Selecting your deployment strategy
Summary
Scaling Up
Technical requirements
Scaling with Spark
Spark tips and tricks
Spark on the cloud
AWS EMR example
Spinning up serverless infrastructure
Containerizing at scale with Kubernetes
Scaling with Ray
Getting started with Ray for ML
Scaling your compute for Ray
Scaling your serving layer with Ray
Designing systems at scale
Summary
Deep Learning, Generative AI, and LLMOps
Going deep with deep learning
Getting started with PyTorch
Scaling and taking deep learning into production
Fine-tuning and transfer learning
Living it large with LLMs
Understanding LLMs
Consuming LLMs via API
Coding with LLMs
Building the future with LLMOps
Validating LLMs
PromptOps
Summary
Building an Example ML Microservice
Technical requirements
Understanding the forecasting problem
Designing our forecasting service
Selecting the tools
Training at scale
Serving the models with FastAPI
Response and request schemas
Managing models in your microservice
Pulling it all together
Containerizing and deploying to Kubernetes
Containerizing the application
Scaling up with Kubernetes
Deployment strategies
Summary
Building an Extract, Transform, Machine Learning Use Case
Technical requirements
Understanding the batch processing problem
Designing an ETML solution
Selecting the tools
Interfaces and storage
Scaling of models
Scheduling of ETML pipelines
Executing the build
Building an ETML pipeline with advanced Airflow features
Summary
Other Books You May Enjoy
Index