Ensemble machine learning combines the power of multiple machine learning approaches, working together to deliver models that are highly performant and highly accurate.
Inside Ensemble Methods for Machine Learning you will find:
• Methods for classification, regression, and recommendations
• Sophisticated off-the-shelf ensemble implementations
• Random forests, boosting, and gradient boosting
• Feature engineering and ensemble diversity
• Interpretability and explainability for ensemble methods
Ensemble machine learning trains a diverse group of machine learning models to work together, aggregating their output to deliver richer results than a single model. Now in Ensemble Methods for Machine Learning you’ll discover core ensemble methods that have proven records in both data science competitions and real-world applications. Hands-on case studies show you how each algorithm works in production. By the time you're done, you'll know the benefits, limitations, and practical methods of applying ensemble machine learning to real-world data, and be ready to build more explainable ML systems.
About the technology
Automatically compare, contrast, and blend the output from multiple models to squeeze the best results from your data. Ensemble machine learning applies a “wisdom of crowds” method that dodges the inaccuracies and limitations of a single model. By basing responses on multiple perspectives, this innovative approach can deliver robust predictions even without massive datasets.
About the book
Ensemble Methods for Machine Learning teaches you practical techniques for applying multiple ML approaches simultaneously. Each chapter contains a unique case study that demonstrates a fully functional ensemble method, with examples including medical diagnosis, sentiment analysis, handwriting classification, and more. There’s no complex math or theory—you’ll learn in a visuals-first manner, with ample code for easy experimentation!
What's inside
• Bagging, boosting, and gradient boosting
• Methods for classification, regression, and retrieval
• Interpretability and explainability for ensemble methods
• Feature engineering and ensemble diversity
About the reader
For Python programmers with machine learning experience.
About the author
Gautam Kunapuli has over 15 years of experience in academia and the machine learning industry.
Author(s): Gautam Kunapuli
Edition: 1
Publisher: Manning
Year: 2023
Language: English
Commentary: Publisher's PDF
Pages: 352
City: Shelter Island, NY
Tags: Machine Learning; Python; Categorical Variables; scikit-learn; Ensemble Learning; Random Forest; AdaBoost; Bagging; Gradient Boosting
Ensemble Methods for Machine Learning
contents
preface
acknowledgments
about this book
Who should read this book
How this book is organized: A road map
About the code
liveBook discussion forum
about the author
about the cover illustration
Part 1 The basics of ensembles
1 Ensemble methods: Hype or hallelujah?
1.1 Ensemble methods: The wisdom of the crowds
1.2 Why you should care about ensemble learning
1.3 Fit vs. complexity in individual models
1.3.1 Regression with decision trees
1.3.2 Regression with support vector machines
1.4 Our first ensemble
1.5 Terminology and taxonomy for ensemble methods
Summary
Part 2 Essential ensemble methods
2 Homogeneous parallel ensembles: Bagging and random forests
2.1 Parallel ensembles
2.2 Bagging: Bootstrap aggregating
2.2.1 Intuition: Resampling and model aggregation
2.2.2 Implementing bagging
2.2.3 Bagging with scikit-learn
2.2.4 Faster training with parallelization
2.3 Random forests
2.3.1 Randomized decision trees
2.3.2 Random forests with scikit-learn
2.3.3 Feature importances
2.4 More homogeneous parallel ensembles
2.4.1 Pasting
2.4.2 Random subspaces and random patches
2.4.3 Extra Trees
2.5 Case study: Breast cancer diagnosis
2.5.1 Loading and preprocessing
2.5.2 Bagging, random forests, and Extra Trees
2.5.3 Feature importances with random forests
Summary
3 Heterogeneous parallel ensembles: Combining strong learners
3.1 Base estimators for heterogeneous ensembles
3.1.1 Fitting base estimators
3.1.2 Individual predictions of base estimators
3.2 Combining predictions by weighting
3.2.1 Majority vote
3.2.2 Accuracy weighting
3.2.3 Entropy weighting
3.2.4 Dempster-Shafer combination
3.3 Combining predictions by meta-learning
3.3.1 Stacking
3.3.2 Stacking with cross validation
3.4 Case study: Sentiment analysis
3.4.1 Preprocessing
3.4.2 Dimensionality reduction
3.4.3 Blending classifiers
Summary
4 Sequential ensembles: Adaptive boosting
4.1 Sequential ensembles of weak learners
4.2 AdaBoost: Adaptive boosting
4.2.1 Intuition: Learning with weighted examples
4.2.2 Implementing AdaBoost
4.2.3 AdaBoost with scikit-learn
4.3 AdaBoost in practice
4.3.1 Learning rate
4.3.2 Early stopping and pruning
4.4 Case study: Handwritten digit classification
4.4.1 Dimensionality reduction with t-SNE
4.4.2 Boosting
4.5 LogitBoost: Boosting with the logistic loss
4.5.1 Logistic vs. exponential loss functions
4.5.2 Regression as a weak learning algorithm for classification
4.5.3 Implementing LogitBoost
Summary
5 Sequential ensembles: Gradient boosting
5.1 Gradient descent for minimization
5.1.1 Gradient descent with an illustrative example
5.1.2 Gradient descent over loss functions for training
5.2 Gradient boosting: Gradient descent + boosting
5.2.1 Intuition: Learning with residuals
5.2.2 Implementing gradient boosting
5.2.3 Gradient boosting with scikit-learn
5.2.4 Histogram-based gradient boosting
5.3 LightGBM: A framework for gradient boosting
5.3.1 What makes LightGBM “light”?
5.3.2 Gradient boosting with LightGBM
5.4 LightGBM in practice
5.4.1 Learning rate
5.4.2 Early stopping
5.4.3 Custom loss functions
5.5 Case study: Document retrieval
5.5.1 The LETOR data set
5.5.2 Document retrieval with LightGBM
Summary
6 Sequential ensembles: Newton boosting
6.1 Newton’s method for minimization
6.1.1 Newton’s method with an illustrative example
6.1.2 Newton’s descent over loss functions for training
6.2 Newton boosting: Newton’s method + boosting
6.2.1 Intuition: Learning with weighted residuals
6.2.2 Intuition: Learning with regularized loss functions
6.2.3 Implementing Newton boosting
6.3 XGBoost: A framework for Newton boosting
6.3.1 What makes XGBoost “extreme”?
6.3.2 Newton boosting with XGBoost
6.4 XGBoost in practice
6.4.1 Learning rate
6.4.2 Early stopping
6.5 Case study redux: Document retrieval
6.5.1 The LETOR data set
6.5.2 Document retrieval with XGBoost
Summary
Part 3 Ensembles in the wild: Adapting ensemble methods to your data
7 Learning with continuous and count labels
7.1 A brief review of regression
7.1.1 Linear regression for continuous labels
7.1.2 Poisson regression for count labels
7.1.3 Logistic regression for classification labels
7.1.4 Generalized linear models
7.1.5 Nonlinear regression
7.2 Parallel ensembles for regression
7.2.1 Random forests and Extra Trees
7.2.2 Combining regression models
7.2.3 Stacking regression models
7.3 Sequential ensembles for regression
7.3.1 Loss and likelihood functions for regression
7.3.2 Gradient boosting with LightGBM and XGBoost
7.4 Case study: Demand forecasting
7.4.1 The UCI Bike Sharing data set
7.4.2 GLMs and stacking
7.4.3 Random forest and Extra Trees
7.4.4 XGBoost and LightGBM
Summary
8 Learning with categorical features
8.1 Encoding categorical features
8.1.1 Types of categorical features
8.1.2 Ordinal and one-hot encoding
8.1.3 Encoding with target statistics
8.1.4 The category_encoders package
8.2 CatBoost: A framework for ordered boosting
8.2.1 Ordered target statistics and ordered boosting
8.2.2 Oblivious decision trees
8.2.3 CatBoost in practice
8.3 Case study: Income prediction
8.3.1 Adult Data Set
8.3.2 Creating preprocessing and modeling pipelines
8.3.3 Category encoding and ensembling
8.3.4 Ordered encoding and boosting with CatBoost
8.4 Encoding high-cardinality string features
Summary
9 Explaining your ensembles
9.1 What is interpretability?
9.1.1 Black-box vs. glass-box models
9.1.2 Decision trees (and decision rules)
9.1.3 Generalized linear models
9.2 Case study: Data-driven marketing
9.2.1 Bank Marketing data set
9.2.2 Training ensembles
9.2.3 Feature importances in tree ensembles
9.3 Black-box methods for global explainability
9.3.1 Permutation feature importance
9.3.2 Partial dependence plots
9.3.3 Global surrogate models
9.4 Black-box methods for local explainability
9.4.1 Local surrogate models with LIME
9.4.2 Local interpretability with SHAP
9.5 Glass-box ensembles: Training for interpretability
9.5.1 Explainable boosting machines
9.5.2 EBMs in practice
Summary
epilogue
E.1 Further reading
E.1.1 Practical ensemble methods
E.1.2 Theory and foundations of ensemble methods
E.2 A few more advanced topics
E.2.1 Ensemble methods for statistical relational learning
E.2.2 Ensemble methods for deep learning
E.3 Thank You!
index
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
R
S
T
U
V
W
X
Z
Ensemble Methods for Machine Learning - back