Applied Predictive Modeling

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Winner of the 2014 Technometrics Ziegel Prize for Outstanding Book Applied Predictive Modeling covers the overall predictive modeling process, beginning with the crucial steps of data preprocessing, data splitting and foundations of model tuning. The text then provides intuitive explanations of numerous common and modern regression and classification techniques, always with an emphasis on illustrating and solving real data problems. Addressing practical concerns extends beyond model fitting to topics such as handling class imbalance, selecting predictors, and pinpointing causes of poor model performance―all of which are problems that occur frequently in practice. The text illustrates all parts of the modeling process through many hands-on, real-life examples. And every chapter contains extensive R code for each step of the process. The data sets and corresponding code are available in the book's companion AppliedPredictiveModeling R package, which is freely available on the CRAN archive. This multi-purpose text can be used as an introduction to predictive models and the overall modeling process, a practitioner's reference handbook, or as a text for advanced undergraduate or graduate level predictive modeling courses. To that end, each chapter contains problem sets to help solidify the covered concepts and uses data available in the book's R package. Readers and students interested in implementing the methods should have some basic knowledge of R. And a handful of the more advanced topics require some mathematical knowledge.

Author(s): Kuhn, Max;Johnson, Kjell
Edition: 1
Publisher: Springer
Year: 2018

Language: English
Commentary: Vector PDF
Pages: 595
City: New York, NY
Tags: Neural Networks; Regression; Classification; Support Vector Machines; Predictive Models; Performance Metrics; R; Linear Regression; Logistic Regression; Overfitting; Random Forest; Data Preprocessing; Model Training; Naïve Bayes; Discriminant Analysis

Preface......Page 8
Contents......Page 10
1 Introduction......Page 15
1.1 Prediction Versus Interpretation......Page 18
1.2 Key Ingredients of Predictive Models......Page 19
1.3 Terminology......Page 20
1.4 Example Data Sets and Typical Data Scenarios......Page 21
1.5 Overview......Page 28
1.6 Notation......Page 29
Part I General Strategies......Page 31
2.1 Case Study: Predicting Fuel Economy......Page 32
2.2 Themes......Page 37
2.3 Summary......Page 39
3 Data Pre-processing......Page 40
3.1 Case Study: Cell Segmentation in High-Content Screening......Page 41
3.2 Data Transformations for Individual Predictors......Page 43
3.3 Data Transformations for Multiple Predictors......Page 46
3.4 Dealing with Missing Values......Page 54
3.5 Removing Predictors......Page 56
3.6 Adding Predictors......Page 60
3.7 Binning Predictors......Page 62
3.8 Computing......Page 64
Exercises......Page 71
4 Over-Fitting and Model Tuning......Page 73
4.1 The Problem of Over-Fitting......Page 74
4.2 Model Tuning......Page 76
4.3 Data Splitting......Page 79
4.4 Resampling Techniques......Page 81
4.5 Case Study: Credit Scoring......Page 85
4.6 Choosing Final Tuning Parameters......Page 86
4.7 Data Splitting Recommendations......Page 89
4.8 Choosing Between Models......Page 90
4.9 Computing......Page 92
Exercises......Page 101
Part II Regression Models......Page 105
5.1 Quantitative Measures of Performance......Page 106
5.2 The Variance-Bias Trade-off......Page 108
5.3 Computing......Page 109
6 Linear Regression and Its Cousins......Page 112
6.1 Case Study: Quantitative Structure-Activity Relationship Modeling......Page 113
6.2 Linear Regression......Page 116
6.3 Partial Least Squares......Page 123
6.4 Penalized Models......Page 133
6.5 Computing......Page 139
Exercises......Page 148
7.1 Neural Networks......Page 151
7.2 Multivariate Adaptive Regression Splines......Page 155
7.3 Support Vector Machines......Page 161
7.4 K-Nearest Neighbors......Page 169
7.5 Computing......Page 171
Exercises......Page 178
8 Regression Trees and Rule-Based Models......Page 182
8.1 Basic Regression Trees......Page 184
8.2 Regression Model Trees......Page 193
8.3 Rule-Based Models......Page 199
8.4 Bagged Trees......Page 201
8.5 Random Forests......Page 207
8.6 Boosting......Page 212
8.7 Cubist......Page 217
8.8 Computing......Page 221
Exercises......Page 227
9 A Summary of Solubility Models......Page 230
10 Case Study: Compressive Strength of ConcreteMixtures......Page 233
10.1 Model Building Strategy......Page 237
10.2 Model Performance......Page 238
10.3 Optimizing Compressive Strength......Page 241
10.4 Computing......Page 244
Part III Classification Models......Page 252
11.1 Class Predictions......Page 253
11.2 Evaluating Predicted Classes......Page 260
11.3 Evaluating Class Probabilities......Page 268
11.4 Computing......Page 272
12.1 Case Study: Predicting Successful Grant Applications......Page 280
12.2 Logistic Regression......Page 287
12.3 Linear Discriminant Analysis......Page 292
12.4 Partial Least Squares Discriminant Analysis......Page 302
12.5 Penalized Models......Page 307
12.6 Nearest Shrunken Centroids......Page 311
12.7 Computing......Page 313
Exercises......Page 331
13.1 Nonlinear Discriminant Analysis......Page 334
13.2 Neural Networks......Page 338
13.3 Flexible Discriminant Analysis......Page 343
13.4 Support Vector Machines......Page 348
13.5 K-Nearest Neighbors......Page 355
13.6 Naïve Bayes......Page 358
13.7 Computing......Page 363
Exercises......Page 371
14 Classification Trees and Rule-Based Models......Page 373
14.1 Basic Classification Trees......Page 374
14.2 Rule-Based Models......Page 387
14.3 Bagged Trees......Page 389
14.4 Random Forests......Page 390
14.5 Boosting......Page 393
14.6 C5.0......Page 396
14.8 Computing......Page 404
Exercises......Page 415
15 A Summary of Grant Application Models......Page 418
16.1 Case Study: Predicting Caravan Policy Ownership......Page 422
16.2 The Effect of Class Imbalance......Page 423
16.4 Alternate Cutoffs......Page 426
16.6 Unequal Case Weights......Page 429
16.7 Sampling Methods......Page 430
16.8 Cost-Sensitive Training......Page 432
16.9 Computing......Page 438
Exercises......Page 445
17 Case Study: Job Scheduling......Page 447
17.1 Data Splitting and Model Strategy......Page 452
17.2 Results......Page 456
17.3 Computing......Page 459
Part IV Other Considerations......Page 463
18 Measuring Predictor Importance......Page 464
18.1 Numeric Outcomes......Page 465
18.2 Categorical Outcomes......Page 469
18.3 Other Approaches......Page 473
18.4 Computing......Page 479
Exercises......Page 485
19 An Introduction to Feature Selection......Page 487
19.1 Consequences of Using Non-informative Predictors......Page 488
19.2 Approaches for Reducing the Number of Predictors......Page 490
19.3 Wrapper Methods......Page 491
19.4 Filter Methods......Page 499
19.5 Selection Bias......Page 500
19.6 Case Study: Predicting Cognitive Impairment......Page 502
19.7 Computing......Page 511
Exercises......Page 518
20 Factors That Can Affect Model Performance......Page 520
20.1 Type III Errors......Page 521
20.2 Measurement Error in the Outcome......Page 523
20.3 Measurement Error in the Predictors......Page 526
20.4 Discretizing Continuous Outcomes......Page 530
20.5 When Should You Trust Your Model's Prediction?......Page 533
20.6 The Impact of a Large Sample......Page 537
20.7 Computing......Page 540
Exercises......Page 541
Appendix......Page 545
A A Summary of Various Models......Page 547
B.1 Start-Up and Getting Help......Page 549
B.2 Packages......Page 550
B.3 Creating Objects......Page 551
B.4 Data Types and Basic Structures......Page 552
B.5 Working with Rectangular Data Sets......Page 556
B.6 Objects and Classes......Page 558
B.7 R Functions......Page 559
B.9 The AppliedPredictiveModeling Package......Page 560
B.10 The caret Package......Page 561
B.11 Software Used in this Text......Page 563
C Interesting Web Sites......Page 565
Indicies......Page 567
General......Page 586