Guide machine learning projects from design to production with the techniques in this unique project management guide. No ML skills required!
In Managing Machine Learning Projects you’ll learn essential machine learning project management techniques, including:
• Understanding an ML project’s requirements
• Setting up the infrastructure for the project and resourcing a team
• Working with clients and other stakeholders
• Dealing with data resources and bringing them into the project for use
• Handling the lifecycle of models in the project
• Managing the application of ML algorithms
• Evaluating the performance of algorithms and models
• Making decisions about which models to adopt for delivery
• Taking models through development and testing
• Integrating models with production systems to create effective applications
• Steps and behaviors for managing the ethical implications of ML technology
Managing Machine Learning Projects is an end-to-end guide for delivering machine learning applications on time and under budget. It lays out tools, approaches, and processes designed to handle the unique challenges of machine learning project management. You’ll follow an in-depth case study through a series of sprints and see how to put each technique into practice. The book’s strong consideration to data privacy, and community impact ensure your projects are ethical, compliant with global legislation, and avoid being exposed to failure from bias and other issues.
About the Technology
Ferrying machine learning projects to production often feels like navigating uncharted waters. From accounting for large data resources to tracking and evaluating multiple models, machine learning technology has radically different requirements than traditional software. Never fear! This book lays out the unique practices you’ll need to ensure your projects succeed.
About the Book
Managing Machine Learning Projects is an amazing source of battle-tested techniques for effective delivery of real-life machine learning solutions. The book is laid out across a series of sprints that take you from a project proposal all the way to deployment into production. You’ll learn how to plan essential infrastructure, coordinate experimentation, protect sensitive data, and reliably measure model performance. Many ML projects fail to create real value—read this book to make sure your project is a success.
What's Inside
• Set up infrastructure and resource a team
• Bring data resources into a project
• Accurately estimate time and effort
• Evaluate which models to adopt for delivery
• Integrate models into effective applications
About the Reader
For anyone interested in better management of machine learning projects. No technical skills required.
About the Author
Simon Thompson has spent 25 years developing AI systems to create applications for use in telecoms, customer service, manufacturing and capital markets. He led the AI research program at BT Labs in the UK, and is now the Head of Data Science at GFT Technologies.
Author(s): Simon Thompson
Edition: 1
Publisher: Manning
Year: 2023
Language: English
Commentary: Publisher's PDF
Pages: 272
City: Shelter Island, NY
Tags: Machine Learning; Security; Ethics; Privacy; Monitoring; Logging; Agile; Documentation; Project Management; Model Selection; Data Pipelines; Sprint Meetings; Feedback; Project Requirements
Managing Machine Learning Projects
brief contents
contents
preface
acknowledgments
about this book
How this book is organized: A roadmap
liveBook discussion forum
about the author
about the cover illustration
1 Introduction: Delivering machine learning projects is hard; let’s do it better
1.1 What is machine learning?
1.2 Why is ML important?
1.3 Other machine learning methodologies
1.4 Understanding this book
1.5 Case study: The Bike Shop
Summary
2 Pre-project: From opportunity to requirements
2.1 Pre-project backlog
2.2 Project management infrastructure
2.3 Project requirements
2.3.1 Funding model
2.3.2 Business requirements
2.4 Data
2.5 Security and privacy
2.6 Corporate responsibility, regulation, and ethical considerations
2.7 Development architecture and process
2.7.1 Development environment
2.7.2 Production architecture
Summary
3 Pre-project: From requirements to proposal
3.1 Build a project hypothesis
3.2 Create an estimate
3.2.1 Time and effort estimates
3.2.2 Team design for ML projects
3.2.3 Project risks
3.3 Pre-sales/pre-project administration
3.4 Pre-project/pre-sales checklist
3.5 The Bike Shop pre-sales
3.6 Pre-project postscript
Summary
4 Getting started
4.1 Sprint 0 backlog
4.2 Finalize team design and resourcing
4.3 A way of working
4.3.1 Process and structure
4.3.2 Heartbeat and communication plan
4.3.3 Tooling
4.3.4 Standards and practices
4.3.5 Documentation
4.4 Infrastructure plan
4.4.1 System access
4.4.2 Technical infrastructure evaluation
4.5 The data story
4.5.1 Data collection motivation
4.5.2 Data collection mechanism
4.5.3 Lineage
4.5.4 Events
4.6 Privacy, security, and an ethics plan
4.7 Project roadmap
4.8 Sprint 0 checklist
4.9 Bike Shop: project setup
Summary
5 Diving into the problem
5.1 Sprint 1 backlog
5.2 Understanding the data
5.2.1 The data survey
5.2.2 Surveying numerical data
5.2.3 Surveying categorical data
5.2.4 Surveying unstructured data
5.2.5 Reporting and using the survey
5.3 Business problem refinement, UX, and application design
5.4 Building data pipelines
5.4.1 Data fusion challenges
5.4.2 Pipeline jungles
5.4.3 Data testing
5.5 Model repository and model versioning
5.5.1 Features, foundational models, and training regimes
5.5.2 Overview of versioning
Summary
6 EDA, ethics, and baseline evaluations
6.1 Exploratory data analysis (EDA)
6.1.1 EDA objectives
6.1.2 Summarizing and describing data
6.1.3 Plots and visualizations
6.1.4 Unstructured data
6.2 Ethics checkpoint
6.3 Baseline models and performance
6.4 What if there are problems?
6.5 Pre-modeling checklist
6.6 The Bike Shop: Pre-modelling
6.6.1 After the survey
6.6.2 EDA implementation
Summary
7 Making useful models with ML
7.1 Sprint 2 backlog
7.2 Feature engineering and data augmentation
7.2.1 Data augmentation
7.3 Model design
7.3.1 Design forces
7.3.2 Overall design
7.3.3 Choosing component models
7.3.4 Inductive bias
7.3.5 Multiple disjoint models
7.3.6 Model composition
7.4 Making models with ML
7.4.1 Modeling process
7.4.2 Experiment tracking and model repositories
7.4.3 AutoML and model search
7.5 Stinky, dirty, no good, smelly models
Summary
8 Testing and selection
8.1 Why test and select?
8.2 Testing processes
8.2.1 Offline testing
8.2.2 Offline test environments
8.2.3 Online testing
8.2.4 Field trials
8.2.5 A/B testing
8.2.6 Multi-armed bandits (MABs)
8.2.7 Nonfunctional testing
8.3 Model selection
8.3.1 Quantitative selection
8.3.2 Choosing With Comparable Tests
8.3.3 Choosing with many tests
8.3.4 Qualitative selection measures
8.4 Post modelling checklist
8.5 The Bike Shop: sprint 2
Summary
9 Sprint 3: system building and production
9.1 Sprint 3 backlog
9.2 Types of ML implementations
9.2.1 Assistive systems: recommenders and dashboards
9.2.2 Delegative systems
9.2.3 Autonomous systems
9.3 Nonfunctional review
9.4 Implementing the production system
9.4.1 Production data infrastructure
9.4.2 The model server and the inference service
9.4.3 User interface design
9.5 Logging, monitoring, management, feedback, and documentation
9.5.1 Model governance
9.5.2 Documentation
9.6 Pre-release testing
9.7 Ethics review
9.8 Promotion to production
9.9 You aren’t done yet
9.10 The Bike Shop sprint 3
Summary
10 Post project (sprint Ω)
10.1 Sprint Ω backlog
10.2 Off your hands and into production?
10.2.1 Getting a grip
10.2.2 ML technical debt and model drift
10.2.3 Retraining
10.2.4 In an emergency
10.2.5 Problems in review
10.3 Team post-project review
10.4 Improving practice
10.5 New technology adoption
10.6 Case study
10.7 Goodbye and good luck
Summary
references
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
index