An Introduction to Statistical Learning (with Applications in R)

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform.

Author(s): Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
Edition: 1
Publisher: Springer
Year: 2013

Language: English
Pages: 434
City: New York

Preface
Contents
1 Introduction
An Overview of Statistical Learning
Wage Data
Stock Market Data
Gene Expression Data
A Brief History of Statistical Learning
This Book
Who Should Read This Book?
Notation and Simple Matrix Algebra
Organization of This Book
Data Sets Used in Labs and Exercises
Book Website
Acknowledgements
2 Statistical Learning
2.1 What Is Statistical Learning?
2.1.1 Why Estimate f?
2.1.2 How Do We Estimate f?
2.1.3 The Trade-Off Between Prediction Accuracyand Model Interpretability
2.1.4 Supervised Versus Unsupervised Learning
2.1.5 Regression Versus Classification Problems
2.2 Assessing Model Accuracy
2.2.1 Measuring the Quality of Fit
2.2.2 The Bias-Variance Trade-Off
2.2.3 The Classification Setting
2.3 Lab: Introduction to R
2.3.1 Basic Commands
2.3.2 Graphics
2.3.3 Indexing Data
2.3.4 Loading Data
2.3.5 Additional Graphical and Numerical Summaries
2.4 Exercises
3 Linear Regression
3.1 Simple Linear Regression
3.1.1 Estimating the Coefficients
3.1.2 Assessing the Accuracy of the CoefficientEstimates
3.1.3 Assessing the Accuracy of the Model
Residual Standard Error
R2 Statistic
3.2 Multiple Linear Regression
3.2.1 Estimating the Regression Coefficients
3.2.2 Some Important Questions
One: Is There a Relationship Between the Response and Predictors?
Two: Deciding on Important Variables
Three: Model Fit
Four: Predictions
3.3 Other Considerations in the Regression Model
3.3.1 Qualitative Predictors
Predictors with Only Two Levels
Qualitative Predictors with More than Two Levels
3.3.2 Extensions of the Linear Model
Removing the Additive Assumption
Non-linear Relationships
3.3.3 Potential Problems
1. Non-linearity of the Data
2. Correlation of Error Terms
3. Non-constant Variance of Error Terms
4. Outliers
5. High Leverage Points
6. Collinearity
3.4 The Marketing Plan
3.5 Comparison of Linear Regression with K-NearestNeighbors
3.6 Lab: Linear Regression
3.6.1 Libraries
3.6.2 Simple Linear Regression
3.6.3 Multiple Linear Regression
3.6.4 Interaction Terms
3.6.5 Non-linear Transformations of the Predictors
3.6.6 Qualitative Predictors
3.6.7 Writing Functions
3.7 Exercises
4 Classification
4.1 An Overview of Classification
4.2 Why Not Linear Regression?
4.3 Logistic Regression
4.3.1 The Logistic Model
4.3.2 Estimating the Regression Coefficients
4.3.3 Making Predictions
4.3.4 Multiple Logistic Regression
4.3.5 Logistic Regression for >2 Response Classes
4.4 Linear Discriminant Analysis
4.4.1 Using Bayes' Theorem for Classification
4.4.2 Linear Discriminant Analysis for p=1
4.4.3 Linear Discriminant Analysis for p>1
4.4.4 Quadratic Discriminant Analysis
4.5 A Comparison of Classification Methods
4.6 Lab: Logistic Regression, LDA, QDA, and KNN
4.6.1 The Stock Market Data
4.6.2 Logistic Regression
4.6.3 Linear Discriminant Analysis
4.6.4 Quadratic Discriminant Analysis
4.6.5 K-Nearest Neighbors
4.6.6 An Application to Caravan Insurance Data
4.7 Exercises
5 Resampling Methods
5.1 Cross-Validation
5.1.1 The Validation Set Approach
5.1.2 Leave-One-Out Cross-Validation
5.1.3 k-Fold Cross-Validation
5.1.4 Bias-Variance Trade-Off for k-FoldCross-Validation
5.1.5 Cross-Validation on Classification Problems
5.2 The Bootstrap
5.3 Lab: Cross-Validation and the Bootstrap
5.3.1 The Validation Set Approach
5.3.2 Leave-One-Out Cross-Validation
5.3.3 k-Fold Cross-Validation
5.3.4 The Bootstrap
Estimating the Accuracy of a Statistic of Interest
Estimating the Accuracy of a Linear Regression Model
5.4 Exercises
6 Linear Model Selection and Regularization
6.1 Subset Selection
6.1.1 Best Subset Selection
6.1.2 Stepwise Selection
Forward Stepwise Selection
Backward Stepwise Selection
Hybrid Approaches
6.1.3 Choosing the Optimal Model
Cp, AIC, BIC, and Adjusted R2
Validation and Cross-Validation
6.2 Shrinkage Methods
6.2.1 Ridge Regression
An Application to the Credit Data
Why Does Ridge Regression Improve Over Least Squares?
6.2.2 The Lasso
Another Formulation for Ridge Regression and the Lasso
The Variable Selection Property of the Lasso
Comparing the Lasso and Ridge Regression
A Simple Special Case for Ridge Regression and the Lasso
Bayesian Interpretation for Ridge Regression and the Lasso
6.2.3 Selecting the Tuning Parameter
6.3 Dimension Reduction Methods
6.3.1 Principal Components Regression
An Overview of Principal Components Analysis
The Principal Components Regression Approach
6.3.2 Partial Least Squares
6.4 Considerations in High Dimensions
6.4.1 High-Dimensional Data
6.4.2 What Goes Wrong in High Dimensions?
6.4.3 Regression in High Dimensions
6.4.4 Interpreting Results in High Dimensions
6.5 Lab 1: Subset Selection Methods
6.5.1 Best Subset Selection
6.5.2 Forward and Backward Stepwise Selection
6.5.3 Choosing Among Models Using the ValidationSet Approach and Cross-Validation
6.6 Lab 2: Ridge Regression and the Lasso
6.6.1 Ridge Regression
6.6.2 The Lasso
6.7 Lab 3: PCR and PLS Regression
6.7.1 Principal Components Regression
6.7.2 Partial Least Squares
6.8 Exercises
7 Moving Beyond Linearity
7.1 Polynomial Regression
7.2 Step Functions
7.3 Basis Functions
7.4 Regression Splines
7.4.1 Piecewise Polynomials
7.4.2 Constraints and Splines
7.4.3 The Spline Basis Representation
7.4.4 Choosing the Number and Locationsof the Knots
7.4.5 Comparison to Polynomial Regression
7.5 Smoothing Splines
7.5.1 An Overview of Smoothing Splines
7.5.2 Choosing the Smoothing Parameter
7.6 Local Regression
7.7 Generalized Additive Models
7.7.1 GAMs for Regression Problems
Pros and Cons of GAMs
7.7.2 GAMs for Classification Problems
7.8 Lab: Non-linear Modeling
7.8.1 Polynomial Regression and Step Functions
7.8.2 Splines
7.8.3 GAMs
7.9 Exercises
8 Tree-Based Methods
8.1 The Basics of Decision Trees
8.1.1 Regression Trees
Predicting Baseball Players' Salaries Using Regression Trees
Prediction via Stratification of the Feature Space
Tree Pruning
8.1.2 Classification Trees
8.1.3 Trees Versus Linear Models
8.1.4 Advantages and Disadvantages of Trees
8.2 Bagging, Random Forests, Boosting
8.2.1 Bagging
Out-of-Bag Error Estimation
Variable Importance Measures
8.2.2 Random Forests
8.2.3 Boosting
8.3 Lab: Decision Trees
8.3.1 Fitting Classification Trees
8.3.2 Fitting Regression Trees
8.3.3 Bagging and Random Forests
8.3.4 Boosting
8.4 Exercises
9 Support Vector Machines
9.1 Maximal Margin Classifier
9.1.1 What Is a Hyperplane?
9.1.2 Classification Using a Separating Hyperplane
9.1.3 The Maximal Margin Classifier
9.1.4 Construction of the Maximal Margin Classifier
9.1.5 The Non-separable Case
9.2 Support Vector Classifiers
9.2.1 Overview of the Support Vector Classifier
9.2.2 Details of the Support Vector Classifier
9.3 Support Vector Machines
9.3.1 Classification with Non-linear DecisionBoundaries
9.3.2 The Support Vector Machine
9.3.3 An Application to the Heart Disease Data
9.4 SVMs with More than Two Classes
9.4.1 One-Versus-One Classification
9.4.2 One-Versus-All Classification
9.5 Relationship to Logistic Regression
9.6 Lab: Support Vector Machines
9.6.1 Support Vector Classifier
9.6.2 Support Vector Machine
9.6.3 ROC Curves
9.6.4 SVM with Multiple Classes
9.6.5 Application to Gene Expression Data
9.7 Exercises
10 Unsupervised Learning
10.1 The Challenge of Unsupervised Learning
10.2 Principal Components Analysis
10.2.1 What Are Principal Components?
10.2.2 Another Interpretation of Principal Components
10.2.3 More on PCA
Scaling the Variables
Uniqueness of the Principal Components
The Proportion of Variance Explained
Deciding How Many Principal Components to Use
10.2.4 Other Uses for Principal Components
10.3 Clustering Methods
10.3.1 K-Means Clustering
10.3.2 Hierarchical Clustering
Interpreting a Dendrogram
The Hierarchical Clustering Algorithm
Choice of Dissimilarity Measure
10.3.3 Practical Issues in Clustering
Small Decisions with Big Consequences
Validating the Clusters Obtained
Other Considerations in Clustering
A Tempered Approach to Interpreting the Results of Clustering
10.4 Lab 1: Principal Components Analysis
10.5 Lab 2: Clustering
10.5.1 K-Means Clustering
10.5.2 Hierarchical Clustering
10.6 Lab 3: NCI60 Data Example
10.6.1 PCA on the NCI60 Data
10.6.2 Clustering the Observations of the NCI60 Data
10.7 Exercises
Index