This book summarizes the state of the art in tree-based methods for insurance: regression trees, random forests and boosting methods. It also exhibits the tools which make it possible to assess the predictive performance of tree-based models. Actuaries need these advanced analytical tools to turn the massive data sets now at their disposal into opportunities.
The exposition alternates between methodological aspects and numerical illustrations or case studies. All numerical illustrations are performed with the R statistical software. The technical prerequisites are kept at a reasonable level in order to reach a broad readership. In particular, master's students in actuarial sciences and actuaries wishing to update their skills in machine learning will find the book useful.
This is the second of three volumes entitled Effective Statistical Learning Methods for Actuaries. Written by actuaries for actuaries, this series offers a comprehensive overview of insurance data analytics with applications to P&C, life and health insurance.
Author(s): Michel Denuit, Donatien Hainaut, Julien Trufin
Series: Springer Actuarial
Publisher: Springer
Year: 2021
Language: English
Pages: 228
City: Cham
Preface
Contents
1 Introduction
1.1 The Risk Classification Problem
1.1.1 Insurance Risk Diversification
1.1.2 Why Classifying Risks?
1.1.3 The Need for Regression Models
1.1.4 Observable Versus Hidden Risk Factors
1.1.5 Insurance Ratemaking Versus Loss Prediction
1.2 Insurance Data
1.2.1 Claim Data
1.2.2 Frequency-Severity Decomposition
1.2.3 Observational Data
1.2.4 Format of the Data
1.2.5 Data Quality Issues
1.3 Exponential Dispersion (ED) Distributions
1.3.1 Frequency and Severity Distributions
1.3.2 From Normal to ED Distributions
1.3.3 Some ED Distributions
1.3.4 Mean and Variance
1.3.5 Weights
1.3.6 Exposure-to-Risk
1.4 Maximum Likelihood Estimation
1.4.1 Likelihood-Based Statistical Inference
1.4.2 Maximum-Likelihood Estimator
1.4.3 Derivation of the Maximum-Likelihood Estimate
1.4.4 Properties of the Maximum-Likelihood Estimators
1.4.5 Examples
1.5 Deviance
1.6 Actuarial Pricing and Tree-Based Methods
1.7 Bibliographic Notes and Further Reading
References
2 Performance Evaluation
2.1 Introduction
2.2 Generalization Error
2.2.1 Definition
2.2.2 Loss Function
2.2.3 Estimates
2.2.4 Decomposition
2.3 Expected Generalization Error
2.3.1 Squared Error Loss
2.3.2 Poisson Deviance Loss
2.3.3 Gamma Deviance Loss
2.3.4 Bias and Variance
2.4 (Expected) Generalization Error for Randomized Training Procedures
2.5 Bibliographic Notes and Further Reading
References
3 Regression Trees
3.1 Introduction
3.2 Binary Regression Trees
3.2.1 Selection of the Splits
3.2.2 The Prediction in Each Terminal Node
3.2.3 The Rule to Determine When a Node Is Terminal
3.2.4 Examples
3.3 Right Sized Trees
3.3.1 Minimal Cost-Complexity Pruning
3.3.2 Choice of the Best Pruned Tree
3.4 Measure of Performance
3.5 Relative Importance of Features
3.5.1 Example 1
3.5.2 Example 2
3.5.3 Effect of Correlated Features
3.6 Interactions
3.7 Limitations of Trees
3.7.1 Model Instability
3.7.2 Lack of Smoothness
3.8 Bibliographic Notes and Further Reading
References
4 Bagging Trees and Random Forests
4.1 Introduction
4.2 Bootstrap
4.3 Bagging Trees
4.3.1 Bias
4.3.2 Variance
4.3.3 Expected Generalization Error
4.4 Random Forests
4.5 Out-of-Bag Estimate
4.6 Interpretability
4.6.1 Relative Importances
4.6.2 Partial Dependence Plots
4.7 Example
4.8 Bibliographic Notes and Further Reading
References
5 Boosting Trees
5.1 Introduction
5.2 Forward Stagewise Additive Modeling
5.3 Boosting Trees
5.3.1 Algorithm
5.3.2 Particular Cases
5.3.3 Size of the Trees
5.4 Gradient Boosting Trees
5.4.1 Numerical Optimization
5.4.2 Steepest Descent
5.4.3 Algorithm
5.4.4 Particular Cases
5.5 Boosting Versus Gradient Boosting
5.6 Regularization and Randomness
5.6.1 Shrinkage
5.6.2 Randomness
5.7 Interpretability
5.7.1 Relative Importances
5.7.2 Partial Dependence Plots
5.7.3 Friedman's H-Statistics
5.8 Example
5.9 Bibliographic Notes and Further Reading
References
6 Other Measures for Model Comparison
6.1 Introduction
6.2 Measures of Association
6.2.1 Context
6.2.2 Probability of Concordance
6.2.3 Kendall's Tau
6.2.4 Spearman's Rho
6.2.5 Numerical Example
6.3 Measuring Lift
6.3.1 Motivation
6.3.2 Predictors Characteristics
6.3.3 Convex Order
6.3.4 Concentration Curve
6.3.5 Assessing the Performances of a Given Predictor
6.3.6 Comparison of the Performances of Two Predictors
6.3.7 Ordered Lorenz Curve
6.3.8 Numerical Illustration
6.3.9 Case Study
6.4 Bibliographic Notes and Further Reading
References