The design patterns in this book capture best practices and solutions to recurring problems in machine learning. The authors, three Google engineers, catalog proven methods to help data scientists tackle common problems throughout the ML process. These design patterns codify the experience of hundreds of experts into straightforward, approachable advice.
In this book, you will find detailed explanations of 30 patterns for data and problem representation, operationalization, repeatability, reproducibility, flexibility, explainability, and fairness. Each pattern includes a description of the problem, a variety of potential solutions, and recommendations for choosing the best technique for your situation.
You'll learn how to:
• Identify and mitigate common challenges when training, evaluating, and deploying ML models
• Represent data for different ML model types, including embeddings, feature crosses, and more
• Choose the right model type for specific problems
• Build a robust training loop that uses checkpoints, distribution strategy, and hyperparameter tuning
• Deploy scalable ML systems that you can retrain and update to reflect new data
• Interpret model predictions for stakeholders and ensure models are treating users fairly
Author(s): Valliappa Lakshmanan, Sara Robinson, Michael Munn
Edition: 1
Publisher: O'Reilly Media
Year: 2020
Language: English
Commentary: Vector PDF
Pages: 408
City: Sebastopol, CA
Tags: Machine Learning; Python; Transfer Learning; SQL; Cookbook; Keras; TensorFlow; Scalability; Hyperparameter Tuning; Design Patterns; Best Practices; scikit-learn; Overfitting; Resilience; Reproducible Research; MLOps; Model Training
Cover
Copyright
Table of Contents
Preface
Who Is This Book For?
What’s Not in the Book
Code Samples
Conventions Used in This Book
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Chapter 1. The Need for Machine Learning Design Patterns
What Are Design Patterns?
How to Use This Book
Machine Learning Terminology
Models and Frameworks
Data and Feature Engineering
The Machine Learning Process
Data and Model Tooling
Roles
Common Challenges in Machine Learning
Data Quality
Reproducibility
Data Drift
Scale
Multiple Objectives
Summary
Chapter 2. Data Representation Design Patterns
Simple Data Representations
Numerical Inputs
Categorical Inputs
Design Pattern 1: Hashed Feature
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 2: Embeddings
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 3: Feature Cross
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 4: Multimodal Input
Problem
Solution
Trade-Offs and Alternatives
Summary
Chapter 3. Problem Representation Design Patterns
Design Pattern 5: Reframing
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 6: Multilabel
Problem
Solution
Trade-Offs and Alternatives
Design Pattern 7: Ensembles
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 8: Cascade
Problem
Solution
Trade-Offs and Alternatives
Design Pattern 9: Neutral Class
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 10: Rebalancing
Problem
Solution
Trade-Offs and Alternatives
Summary
Chapter 4. Model Training Patterns
Typical Training Loop
Stochastic Gradient Descent
Keras Training Loop
Training Design Patterns
Design Pattern 11: Useful Overfitting
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 12: Checkpoints
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 13: Transfer Learning
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 14: Distribution Strategy
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 15: Hyperparameter Tuning
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Summary
Chapter 5. Design Patterns for Resilient Serving
Design Pattern 16: Stateless Serving Function
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 17: Batch Serving
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 18: Continued Model Evaluation
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 19: Two-Phase Predictions
Problem
Solution
Trade-Offs and Alternatives
Design Pattern 20: Keyed Predictions
Problem
Solution
Trade-Offs and Alternatives
Summary
Chapter 6. Reproducibility Design Patterns
Design Pattern 21: Transform
Problem
Solution
Trade-Offs and Alternatives
Design Pattern 22: Repeatable Splitting
Problem
Solution
Trade-Offs and Alternatives
Design Pattern 23: Bridged Schema
Problem
Solution
Trade-Offs and Alternatives
Design Pattern 24: Windowed Inference
Problem
Solution
Trade-Offs and Alternatives
Design Pattern 25: Workflow Pipeline
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 26: Feature Store
Problem
Solution
Why It Works
Trade-Offs and Alternatives
Design Pattern 27: Model Versioning
Problem
Solution
Trade-Offs and Alternatives
Summary
Chapter 7. Responsible AI
Design Pattern 28: Heuristic Benchmark
Problem
Solution
Trade-Offs and Alternatives
Design Pattern 29: Explainable Predictions
Problem
Solution
Trade-Offs and Alternatives
Design Pattern 30: Fairness Lens
Problem
Solution
Trade-Offs and Alternatives
Summary
Chapter 8. Connected Patterns
Patterns Reference
Pattern Interactions
Patterns Within ML Projects
ML Life Cycle
AI Readiness
Common Patterns by Use Case and Data Type
Natural Language Understanding
Computer Vision
Predictive Analytics
Recommendation Systems
Fraud and Anomaly Detection
Index
About the Authors
Colophon