Debugging Machine Learning Models with Python: Develop high-performance, low-bias, and explainable machine learning

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Debugging Machine Learning Models with Python is a comprehensive guide that navigates you through the entire spectrum of mastering machine learning, from foundational concepts to advanced techniques. It goes beyond the basics to arm you with the expertise essential for building reliable, high-performance models for industrial applications. Whether you're a data scientist, analyst, machine learning engineer, or Python developer, this book will empower you to design modular systems for data preparation, accurately train and test models, and seamlessly integrate them into larger technologies. By bridging the gap between theory and practice, you'll learn how to evaluate model performance, identify and address issues, and harness recent advancements in deep learning and generative modeling using PyTorch and scikit-learn. Your journey to developing high quality models in practice will also encompass causal and human-in-the-loop modeling and machine learning explainability. With hands-on examples and clear explanations, you'll develop the skills to deliver impactful solutions across domains such as healthcare, finance, and e-commerce.

Author(s): Ali Madani
Publisher: Packt
Year: 2023

Language: English
Pages: 345

Cover
Title Page
Copyright
Dedication
Foreword
Contributors
Table of Contents
Preface
Part 1:Debugging for Machine Learning Modeling
Chapter 1: Beyond Code Debugging
Technical requirements
Machine learning at a glance
Types of machine learning modeling
Supervised learning
Unsupervised learning
Self-supervised learning
Semi-supervised learning
Reinforcement learning
Generative machine learning
Debugging in software development
Error messages in Python
Debugging techniques
Debuggers
Best practices for high-quality Python programming
Version control
Debugging beyond Python
Flaws in data used for modeling
Data format and structure
Data quantity and quality
Data biases
Model and prediction-centric debugging
Underfitting and overfitting
Inference in model testing and production
Data or hyperparameters for changing landscapes
Summary
Questions
References
Chapter 2: Machine Learning Life Cycle
Technical requirements
Before we start modeling
Data collection
Data selection
Data exploration
Data wrangling
Structuring
Enriching
Data transformation
Cleaning
Modeling data preparation
Feature selection and extraction
Designing an evaluation and testing strategy
Model training and evaluation
Testing the code and the model
Model deployment and monitoring
Summary
Questions
References
Chapter 3: Debugging toward Responsible AI
Technical requirements
Impartial modeling fairness in machine learning
Data bias
Algorithmic bias
Security and privacy in machine learning
Data privacy
Data poisoning
Adversarial attacks
Output integrity attacks
System manipulation
Secure and private machine learning techniques
Transparency in machine learning modeling
Accountable and open to inspection modeling
Data and model governance
Summary
Questions
References
Part 2:Improving Machine Learning Models
Chapter 4: Detecting Performance and Efficiency Issues in Machine Learning Models
Technical requirements
Performance and error assessment measures
Classification
Regression
Clustering
Visualization for performance assessment
Summary metrics are not enough
Visualizations could be misleading
Don’t interpret your plots as you wish
Bias and variance diagnosis
Model validation strategy
Error analysis
Beyond performance
Summary
Questions
References
Chapter 5: Improving the Performance of Machine Learning Models
Technical requirements
Options for improving model performance
Grid search
Random search
Bayesian search
Successive halving
Synthetic data generation
Oversampling for imbalanced data
Improving pre-training data processing
Anomaly detection and outlier removal
Benefitting from data of lower quality or relevance
Regularization to improve model generalizability
Summary
Questions
References
Chapter 6: Interpretability and Explainability in Machine Learning Modeling
Technical requirements
Interpretable versus black-box machine learning
Interpretable machine learning models
Explainability for complex models
Explainability methods in machine learning
Local explainability techniques
Global explanation
Practicing machine learning explainability in Python
Explanations in SHAP
Explanations using LIME
Counterfactual generation using Diverse Counterfactual Explanations (DiCE)
Reviewing why having explainability is not enough
Summary
Questions
References
Chapter 7: Decreasing Bias and Achieving Fairness
Technical requirements
Fairness in machine learning modeling
Proxies for sensitive variables
Sources of bias
Biases introduced in data generation and collection
Bias in model training and testing
Bias in production
Using explainability techniques
Fairness assessment and improvement in Python
Summary
Questions
References
Part 3:Low-Bug Machine Learning Development and Deployment
Chapter 8: Controlling Risks Using Test-Driven Development
Technical requirements
Test-driven development for machine learning modeling
Unit testing
Machine learning differential testing
Tracking machine learning experiments
Summary
Questions
References
Chapter 9: Testing and Debugging for Production
Technical requirements
Infrastructure testing
Infrastructure as Code tools
Infrastructure testing tools
Infrastructure testing using Pytest
Integration testing of machine learning pipelines
Integration testing using pytest
Monitoring and validating live performance
Model assertion
Summary
Questions
References
Chapter 10: Versioning and Reproducible Machine Learning Modeling
Technical requirements
Reproducibility in machine learning
Data versioning
Model versioning
Summary
Questions
References
Chapter 11: Avoiding and Detecting Data and Concept Drifts
Technical requirements
Avoiding drifts in your models
Avoiding data drift
Addressing concept drift
Detecting drifts
Practicing with alibi_detect for drift detection
Practicing with evidently for drift detection
Summary
Questions
References
Part 4:Deep Learning Modeling
Chapter 12: Going Beyond ML Debugging with Deep Learning
Technical requirements
Introduction to artificial neural networks
Optimization algorithms
Frameworks for neural network modeling
PyTorch for deep learning modeling
Summary
Questions
References
Chapter 13: Advanced Deep Learning Techniques
Technical requirements
Types of neural networks
Categorization based on data type
Convolutional neural networks for image shape data
Performance assessment
CNN modeling using PyTorch
Image data transformation and augmentation for CNNs
Using pre-trained models
Transformers for language modeling
Tokenization
Language embedding
Language modeling using pre-trained models
Modeling graphs using deep neural networks
Graph neural networks
GNNs with PyTorch Geometric
Summary
Questions
References
Chapter 14: Introduction to Recent Advancements in Machine Learning
Technical requirements
Generative modeling
Generative deep learning techniques
Prompt engineering for text-based generative models
Generative modeling using PyTorch
Reinforcement learning
Reinforcement learning with human feedback (RLHF)
Self-supervised learning (SSL)
Self-supervised learning with PyTorch
Summary
Questions
References
Part 5:Advanced Topics in Model Debugging
Chapter 15: Correlation versus Causality
Technical requirements
Correlation as part of machine learning models
Causal modeling to reduce risks and improve performance
Assessing causation in machine learning models
Causal inference
Causal modeling using Python
Using dowhy for causal effect estimation
Using bnlearn for causal inference through Bayesian networks
Summary
Questions
References
Chapter 16: Security and Privacy in Machine Learning
Technical requirements
Encryption techniques and their use in machine learning
Implementing AES encryption in Python
Homomorphic encryption
Differential privacy
Federated learning
Summary
Questions
References
Chapter 17: Human-in-the-Loop Machine Learning
Humans in the machine learning life cycle
Expert feedback collection
Human-in-the-loop modeling
Summary
Questions
References
Assessments
Chapter 1 – Beyond Code Debugging
Chapter 2 – Machine Learning Life Cycle
Chapter 3 – Debugging toward Responsible AI
Chapter 4 – Detecting Performance and Efficiency Issues in Machine Learning Models
Chapter 5 – Improving the Performance of Machine Learning Models
Chapter 6 – Interpretability and Explainability in Machine Learning Modeling
Chapter 7 – Decreasing Bias and Achieving Fairness
Chapter 8 – Controlling Risks Using Test-Driven Development
Chapter 9 – Testing and Debugging for Production
Chapter 10 – Versioning and Reproducible Machine Learning Modeling
Chapter 11 – Avoiding and Detecting Data and Concept Drifts
Chapter 12 – Going Beyond ML Debugging with Deep Learning
Chapter 13 – Advanced Deep Learning Techniques
Chapter 14 – Introduction to Recent Advancements in Machine Learning
Chapter 15 – Correlation versus Causality
Chapter 16 – Security and Privacy in Machine Learning
Chapter 17 – Human-in-the-Loop Machine Learning
Index
About Packt
Other Books You May Enjoy