Guide machine learning projects from design to production with the techniques in this one-of-a-kind project management guide. No ML skills required
In Managing Machine Learning Projects you’ll learn essential machine learning project management techniques, including
Understanding an ML project’s requirements
Setting up the infrastructure for the project and resourcing a team
Working with clients and other stakeholders
Dealing with data resources and bringing them into the project for use
Handling the lifecycle of models in the project
Managing the application of ML algorithms
Evaluating the performance of algorithms and models
Making decisions about which models to adopt for delivery
Taking models through development and testing
Integrating models with production systems to create effective applications
Steps and behaviors for managing the ethical implications of ML technology
Managing Machine Learning Projects is an end-to-end guide for delivering machine learning applications on time and under budget. It lays out tools, approaches, and processes designed to handle the unique challenges of machine learning project management. You’ll follow an in-depth case study through a series of sprints and see how to put each technique into practice. The book’s strong consideration to data privacy, and community impact ensure your projects are ethical, compliant with global legislation, and avoid being exposed to failure from bias and other issues.
About the Technology
Ferrying machine learning projects to production often feels like navigating uncharted waters. From accounting for large data resources to tracking and evaluating multiple models, machine learning technology has radically different requirements than traditional software. Never fear! This book lays out the unique practices you’ll need to ensure your projects succeed.
About the Book
Managing Machine Learning Projects is an amazing source of battle-tested techniques for effective delivery of real-life machine learning solutions. The book is laid out across a series of sprints that take you from a project proposal all the way to deployment into production. You’ll learn how to plan essential infrastructure, coordinate experimentation, protect sensitive data, and reliably measure model performance. Many ML projects fail to create real value—read this book to make sure your project is a success.
What's Inside
Set up infrastructure and resource a team
Bring data resources into a project
Accurately estimate time and effort
Evaluate which models to adopt for delivery
Integrate models into effective applications
Author(s): Simon Thompson
Publisher: Manning Publications Co.
Year: 2023
Language: English
Pages: 273
inside front cover
Delivering Machine Learning Projects
Copyright
contents
front matter
preface
acknowledgments
about this book
How this book is organized: A roadmap
LiveBook discussion forum
about the author
about the cover illustration
1 Introduction: Delivering machine learning projects is hard; let’s do it better
1.1 What is machine learning?
1.2 Why is ML important?
1.3 Other machine learning methodologies
1.4 Understanding this book
1.5 Case study: The Bike Shop
Summary
2 Pre-project: From opportunity to requirements
2.1 Pre-project backlog
2.2 Project management infrastructure
2.3 Project requirements
2.3.1 Funding model
2.3.2 Business requirements
2.4 Data
2.5 Security and privacy
2.6 Corporate responsibility, regulation, and ethical considerations
2.7 Development architecture and process
2.7.1 Development environment
2.7.2 Production architecture
Summary
3 Pre-project: From requirements to proposal
3.1 Build a project hypothesis
3.2 Create an estimate
3.2.1 Time and effort estimates
3.2.2 Team design for ML projects
3.2.3 Project risks
3.3 Pre-sales/pre-project administration
3.4 Pre-project/pre-sales checklist
3.5 The Bike Shop pre-sales
3.6 Pre-project postscript
Summary
4 Getting started
4.1 Sprint 0 backlog
4.2 Finalize team design and resourcing
4.3 A way of working
4.3.1 Process and structure
4.3.2 Heartbeat and communication plan
4.3.3 Tooling
4.3.4 Standards and practices
4.3.5 Documentation
4.4 Infrastructure plan
4.4.1 System access
4.4.2 Technical infrastructure evaluation
4.5 The data story
4.5.1 Data collection motivation
4.5.2 Data collection mechanism
4.5.3 Lineage
4.5.4 Events
4.6 Privacy, security, and an ethics plan
4.7 Project roadmap
4.8 Sprint 0 checklist
4.9 Bike Shop: project setup
Summary
5 Diving into the problem
5.1 Sprint 1 backlog
5.2 Understanding the data
5.2.1 The data survey
5.2.2 Surveying numerical data
5.2.3 Surveying categorical data
5.2.4 Surveying unstructured data
5.2.5 Reporting and using the survey
5.3 Business problem refinement, UX, and application design
5.4 Building data pipelines
5.4.1 Data fusion challenges
5.4.2 Pipeline jungles
5.4.3 Data testing
5.5 Model repository and model versioning
5.5.1 Features, foundational models, and training regimes
5.5.2 Overview of versioning
Summary
6 EDA, ethics, and baseline evaluations
6.1 Exploratory data analysis (EDA)
6.1.1 EDA objectives
6.1.2 Summarizing and describing data
6.1.3 Plots and visualizations
6.1.4 Unstructured data
6.2 Ethics checkpoint
6.3 Baseline models and performance
6.4 What if there are problems?
6.5 Pre-modeling checklist
6.6 The Bike Shop: Pre-modelling
6.6.1 After the survey
6.6.2 EDA implementation
Summary
7 Making useful models with ML
7.1 Sprint 2 backlog
7.2 Feature engineering and data augmentation
7.2.1 Data augmentation
7.3 Model design
7.3.1 Design forces
7.3.2 Overall design
7.3.3 Choosing component models
7.3.4 Inductive bias
7.3.5 Multiple disjoint models
7.3.6 Model composition
7.4 Making models with ML
7.4.1 Modeling process
7.4.2 Experiment tracking and model repositories
7.4.3 AutoML and model search
7.5 Stinky, dirty, no good, smelly models
Summary
8 Testing and selection
8.1 Why test and select?
8.2 Testing processes
8.2.1 Offline testing
8.2.2 Offline test environments
8.2.3 Online testing
8.2.4 Field trials
8.2.5 A/B testing
8.2.6 Multi-armed bandits (MABs)
8.2.7 Nonfunctional testing
8.3 Model selection
8.3.1 Quantitative selection
8.3.2 Choosing With Comparable Tests
8.3.3 Choosing with many tests
8.3.4 Qualitative selection measures
8.4 Post modelling checklist
8.5 The Bike Shop: sprint 2
Summary
9 Sprint 3: system building and production
9.1 Sprint 3 backlog
9.2 Types of ML implementations
9.2.1 Assistive systems: recommenders and dashboards
9.2.2 Delegative systems
9.2.3 Autonomous systems
9.3 Nonfunctional review
9.4 Implementing the production system
9.4.1 Production data infrastructure
9.4.2 The model server and the inference service
9.4.3 User interface design
9.5 Logging, monitoring, management, feedback, and documentation
9.5.1 Model governance
9.5.2 Documentation
9.6 Pre-release testing
9.7 Ethics review
9.8 Promotion to production
9.9 You aren’t done yet
9.10 The Bike Shop sprint 3
Summary
10 Post project (sprint Ω)
10.1 Sprint Ω backlog
10.2 Off your hands and into production?
10.2.1 Getting a grip
10.2.2 ML technical debt and model drift
10.2.3 Retraining
10.2.4 In an emergency
10.2.5 Problems in review
10.3 Team post-project review
10.4 Improving practice
10.5 New technology adoption
10.6 Case study
10.7 Goodbye and good luck
Summary
references
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
index