This textbook considers statistical learning applications when interest centers on the conditional distribution of a response variable, given a set of predictors, and in the absence of a credible model that can be specified before the data analysis begins. Consistent with modern data analytics, it emphasizes that a proper statistical learning data analysis depends in an integrated fashion on sound data collection, intelligent data management, appropriate statistical procedures, and an accessible interpretation of results. The unifying theme is that supervised learning properly can be seen as a form of regression analysis. Key concepts and procedures are illustrated with a large number of real applications and their associated code in R, with an eye toward practical implications. The growing integration of computer science and statistics is well represented including the occasional, but salient, tensions that result. Throughout, there are links to the big picture. The third edition considers significant advances in recent years, among which are:
• the development of overarching, conceptual frameworks for statistical learning;
• the impact of “big data” on statistical learning;
• the nature and consequences of post-model selection statistical inference;
• deep learning in various forms;
• the special challenges to statistical inference posed by statistical learning;
• the fundamental connections between data collection and data analysis;
• interdisciplinary ethical and political issues surrounding the application of algorithmic methods in a wide variety of fields, each linked to concerns about transparency, fairness, and accuracy.
This edition features new sections on accuracy, transparency, and fairness, as well as a new chapter on deep learning. Precursors to deep learning get an expanded treatment. The connections between fitting and forecasting are considered in greater depth. Discussion of the estimation targets for algorithmic methods is revised and expanded throughout to reflect the latest research. Resampling procedures are emphasized. The material is written for upper undergraduate and graduate students in the social, psychological and life sciences and for researchers who want to apply statistical learning procedures to scientific and policy problems.
Author(s): Richard A. Berk
Series: Springer Texts In Statistics
Edition: 3
Publisher: Springer
Year: 2020
Language: English
Pages: XXVI, 433
Tags: Statistical Theory And Methods
Preface to the Third Edition
Preface to the Second Edition
Preface to the First Edition
Endnotes
References
Contents
1 Statistical Learning as a Regression Problem
1.1 Getting Started
1.2 Setting the Regression Context
1.3 Revisiting the Ubiquitous Linear Regression Model
1.3.1 Problems in Practice
1.4 Working with Statistical Models that are Wrong
1.4.1 An Alternative Approach to Regression
1.4.2 More on Statistical Inference with Wrong Models
1.4.3 Introduction to Sandwich Standard Errors
1.4.4 Introduction to Conformal Inference
1.4.5 Introduction to the Nonparametric Bootstrap
1.4.6 Wrong Regression Models with Binary Response Variables
1.5 The Transition to Statistical Learning
1.5.1 Models Versus Algorithms
1.6 Some Initial Concepts
1.6.1 Overall Goals of Statistical Learning
1.6.2 Forecasting with Supervised Statistical Learning
1.6.3 Overfitting
1.6.4 Data Snooping
1.6.5 Some Constructive Responses to Overfitting and Data Snooping
1.6.5.1 Training Data, Evaluation Data, and Test Data
1.6.5.2 Data Splitting
1.6.5.3 Cross-Validation
1.6.6 Loss Functions and Related Concepts
1.6.6.1 Definitions of In-Sample and Out-Of-Sample Performance
1.6.6.2 Categorical Response Variables
1.6.6.3 Asymmetric Loss
1.6.7 The Bias–Variance Tradeoff
1.6.8 Linear Estimators
1.6.9 Degrees of Freedom
1.6.10 Basis Functions
1.6.11 The Curse of Dimensionality
1.7 Statistical Learning in Context
Demonstrations and Exercises
Set 1
Set 2
Set 3
Endnotes
Endnotes
References
2 Splines, Smoothers, and Kernels
2.1 Introduction
2.2 Regression Splines
2.2.1 Piecewise Linear Population Approximations
2.2.2 Polynomial Regression Splines
2.2.3 Natural Cubic Splines
2.2.4 B-Splines
2.3 Penalized Smoothing
2.3.1 Shrinkage and Regularization
2.3.1.1 Ridge Regression
2.3.1.2 A Ridge Regression Illustration
2.3.1.3 The Least Absolute Shrinkage and Selection Operator (LASSO)
2.3.1.4 A Lasso Regression Illustration
2.4 Penalized Regression Splines
2.4.1 An Application
2.5 Smoothing Splines
2.5.1 A Smoothing Splines Illustration
2.6 Locally Weighted Regression as a Smoother
2.6.1 Nearest Neighbor Methods
2.6.2 Locally Weighted Regression
2.6.2.1 A Lowess Illustration
2.7 Smoothers for Multiple Predictors
2.7.1 Smoothing in Two Dimensions
2.7.2 The Generalized Additive Model
2.7.2.1 A GAM Fitting Algorithm
2.7.2.2 An Illustration Using the Regression Splines Implementation of the Generalized Additive Model
2.8 Smoothers with Categorical Variables
2.8.1 An Illustration Using the Generalized Additive Model with a Binary Outcome
2.9 An Illustration of Statistical Inference After Model Selection
2.9.1 Level I Versus Level II Summary
2.10 Kernelized Regression
2.10.1 Radial Basis Kernel
2.10.2 ANOVA Radial Basis Kernel
2.10.3 A Kernel Regression Application
2.11 Summary and Conclusions
Endnotes
Endnotes
References
3 Classification and Regression Trees (CART)
3.1 Introduction
3.2 An Introduction to Recursive Partitioning in CART
3.3 The Basic Ideas in More Depth
3.3.1 Tree Diagrams for Showing What the Greedy Algorithm Determined
3.3.2 An Initial Application
3.3.3 Classification and Forecasting with CART
3.3.4 Confusion Tables
3.3.5 CART as an Adaptive Nearest Neighbor Method
3.4 The Formalities of Splitting a Node
3.5 An Illustrative Prison Inmate Risk Assessment Using CART
3.6 Classification Errors and Costs
3.6.1 Default Costs in CART
3.6.2 Prior Probabilities and Relative MisclassificationCosts
3.7 Varying the Prior and the Complexity Parameter
3.8 An Example with Three Response Categories
3.9 Regression Trees
3.9.1 A CART Application for the Correlates of a Student's GPA in High School
3.10 Pruning
3.11 Missing Data
3.11.1 Missing Data with CART
3.12 More on CART Instability
3.13 Summary of Statistical Inference with CART
3.13.1 Summary of Statistical Inference for CART Forecasts
3.14 Overall Summary and Conclusions
Exercises
Exercises
Problem Set 1
Problem Set 2
Problem Set 3
Problem Set 4
Endnotes
References
4 Bagging
4.1 Introduction
4.2 The Bagging Algorithm
4.3 Some Bagging Details
4.3.1 Revisiting the CART Instability Problem
4.3.2 Resampling Methods for Bagging
4.3.3 Votes Over Trees and Probabilities
4.3.4 Forecasting and Imputation
4.3.5 Bagging Estimation and Statistical Inference
4.3.6 Margins for Classification
4.3.7 Using Out-of-Bag Observations as Test Data
4.3.8 Bagging and Bias
4.4 Some Limitations of Bagging
4.4.1 Sometimes Bagging Cannot Help
4.4.2 Sometimes Bagging Can Make the Estimation Bias Worse
4.4.3 Sometimes Bagging Can Make the Estimation Variance Worse
4.5 A Bagging Illustration
4.6 Summary and Conclusions
Exercises
Exercises
Problem Set 1
Problem Set 2
Problem Set 3
Problem Set 4
Endnotes
References
5 Random Forests
5.1 Introduction and Overview
5.1.1 Unpacking How Random Forests Works
5.2 An Initial Random Forests Illustration
5.3 A Few Technical Formalities
5.3.1 What Is a Random Forest?
5.3.2 Margins and Generalization Error for Classifiers in General
5.3.3 Generalization Error for Random Forests
5.3.4 The Strength of a Random Forest
5.3.5 Dependence
5.3.6 Putting It Together
5.3.6.1 Benefits From Interpolation
5.3.6.2 Benefits from Averaging
5.3.6.3 Defeating Competition Between Predictors
5.3.6.4 All Together Now
5.4 Random Forests and Adaptive Nearest Neighbor Methods
5.5 Introducing Misclassification Costs
5.5.1 A Brief Illustration Using Asymmetric Costs
5.6 Determining the Importance of the Predictors
5.6.1 Contributions to the Fit
5.6.2 Contributions to Prediction
5.6.2.1 Some Examples of Importance Plots with Extensions
5.7 Input Response Functions
5.7.1 Partial Dependence Plot Example
5.7.2 More than Two Response Classes
5.8 Classification and the Proximity Matrix
5.8.1 Clustering by Proximity Values
5.8.1.1 Using Proximity Values to Impute Missing Data
5.8.1.2 Using Proximities to Detect Outliers
5.9 Empirical Margins
5.10 Quantitative Response Variables
5.11 A Random Forest Illustration Using a Quantitative Response Variable
5.12 Statistical Inference with Random Forests
5.13 Software and Tuning Parameters
5.14 Bayesian Additive Regression Trees (BART)
5.15 Summary and Conclusions
Exercises
Exercises
Problem Set 1
Problem Set 2
Problem Set 3
Endnotes
References
6 Boosting
6.1 Introduction
6.2 AdaBoost
6.2.1 A Toy Numerical Example of AdaBoost.M1
6.2.2 Why Does Boosting Work so Well for Classification?
6.2.2.1 Boosting as a Margin Maximizer
6.2.2.2 Boosting as a Statistical Optimizer
6.2.2.3 Boosting as an Interpolator
6.3 Stochastic Gradient Boosting
6.3.1 Gradient Boosting More Formally
6.3.2 Stochastic Gradient Boosting in Practice
6.3.3 Tuning Parameters
6.3.4 Output
6.4 Asymmetric Costs
6.5 Boosting, Estimation, and Consistency
6.6 A Binomial Example
6.7 Boosting for Statistical Inference and Forecasting
6.7.1 An Imputation Example
6.8 A Quantile Regression Example
6.9 Boosting in Service of Causal Inference in Observational Studies
6.10 Summary and Conclusions
Exercises
Exercises
Problem Set 1
Problem Set 2
Problem Set 3
Endnotes
References
7 Support Vector Machines
7.1 Introduction
7.2 Support Vector Machines in Pictures
7.2.1 The Support Vector Classifier
7.2.2 Support Vector Machines
7.3 Support Vector Machines More Formally
7.3.1 The Support Vector Classifier Again: The SeparableCase
7.3.2 The Nonseparable Case
7.3.3 Support Vector Machines
7.3.4 SVM for Regression
7.3.5 Statistical Inference for Support Vector Machines
7.4 A Classification Example
7.5 Summary and Conclusions
Exercises
Exercises
Problem Set 1
Problem Set 2
Problem Set 3
Endnotes
References
8 Neural Networks
8.1 Introduction
8.2 Conventional (Vanilla) Neural Networks
8.2.1 Implementation of Gradient Descent
8.2.2 Statistical Inference with Neural Networks
8.2.3 An Application
8.2.4 Some Recent Developments
8.2.5 Implications of Conventional Neural Nets for Practice
8.3 Deep Learning with Neural Networks
8.3.1 Convolutional Neural Networks
8.3.1.1 CNNs in More Detail
8.3.2 Recurrent Neural Networks
8.3.2.1 Statistical Inference for RNNs
8.3.2.2 RNN Software
8.3.2.3 An RNN Application
8.3.3 Adversarial Neural Networks
8.4 Conclusions
Demonstrations and Exercises
Set 1
Set 2
Endnotes
References
9 Reinforcement Learning and Genetic Algorithms
9.1 Introduction to Reinforcement Learning
9.2 Genetic Algorithms
9.3 An Application
9.4 Conclusions
Demonstrations and Exercises
Set 1
Set 2
Endnotes
References
10 Integrating Themes and a Bit of Craft Lore
10.1 Some Integrating Technical Themes
10.2 Integrating Themes Addressing Ethics and Politics
10.3 Some Suggestions for Day-to-Day Practice
10.3.1 Choose the Right Data Analysis Procedure
10.3.2 Get to Know Your Software
10.3.3 Do Not Forget the Basics
10.3.4 Getting Good Data
10.3.5 Match Your Goals to What You Can Credibly Do
10.4 Some Concluding Observations
Endnotes
References
Bibliography
Index