Advance your skills in building predictive models with SAS!
Building Regression Models with SAS: A Guide for Data Scientists teaches data scientists, statisticians, and other analysts who use SAS to train regression models for prediction with large, complex data. Each chapter focuses on a particular model and includes a high-level overview, followed by basic concepts, essential syntax, and examples using new procedures in both SAS/STAT and SAS Viya. By emphasizing introductory examples and interpretation of output, this book provides readers with a clear understanding of how to build the following types of models:
- general linear models
- quantile regression models
- logistic regression models
- generalized linear models
- generalized additive models
- proportional hazards regression models
- tree models
- models based on multivariate adaptive regression splines
Building Regression Models with SAS is an essential guide to learning about a variety of models that provide interpretability as well as predictive performance.
Author(s): Robert N Rodriguez
Publisher: SAS Institute
Year: 2023
Language: English
Pages: 463
City: Cary
Contents
Motivation for the Book
Audiences for the Book
Knowledge Prerequisites for the Book
Software Prerequisites for the Book
What the Book Does Not Cover
Acknowledgments
Introduction
Model Building at the Crossroads of Machine Learning and Statistics
Overview of Procedures for Building Regression Models
Practical Benefits
When Does Interpretability Matter?
When Should You Use the Procedures in This Book?
How to Read This Book
General Linear Models
Building General Linear Models: Concepts
Example: Predicting Network Activity
Essential Aspects of Regression Model Building
Notation and Terminology for General Linear Models
Parameter Estimation
The Bias-Variance Tradeoff for Prediction
Model Flexibility and Degrees of Freedom
Assessment and Minimization of Prediction Error
Summary
Building General Linear Models: Issues
Problems with Data-Driven Model Selection
Example: Simulation of Selection Bias
Freedman's Paradox
Summary
Building General Linear Models: Methods
Best-Subset Regression
Sequential Selection Methods
Shrinkage Methods
Summary
Building General Linear Models: Procedures
Introduction to the GLMSELECT Procedure
Specifying the Candidate Effects and the Selection Method
Controlling the Selection Method
Which Selection Methods and Criteria Should You Use?
Comparing Selection Criteria
Forced Inclusion of Model Effects
Example: Predicting the Close Rate of Retail Stores
Example: Building a Model with Forward Selection
Example: Building a Model with the Lasso and SBC
Example: Building a Model with the Lasso and Cross Validation
Example: Building a Model with the Lasso and Validation Data
Example: Building a Model with the Adaptive Lasso
Example: Building a Model with the Group Lasso
Using the REG Procedure for Best-Subset Regression
Example: Finding the Best Model for Close Rate
Example: Best-Subset Regression with Categorical Predictors
Introduction to the REGSELECT Procedure
Example: Defining a CAS Session and Loading Data
Example: Differences from the GLMSELECT Procedure
Example: Building a Model with Forward Swap Selection
Using the Final Model to Score New Data
Summary
Building General Linear Models: Collinearity
Example: Modeling the Effect of Air Pollution on Mortality
Detecting Collinearity
Dimension Reduction Using Variable Clustering
Ridge Regression
The Elastic Net Method
Principal Components Regression
Partial Least Squares Regression
Conclusions for Air Pollution Example
Summary
Building General Linear Models: Model Averaging
Approaches to Model Averaging
Using the GLMSELECT Procedure for Model Averaging
Bootstrap Model Averaging with Stepwise Regression
Refitting to Build a Parsimonious Model
Model Averaging with Akaike Weights
Summary
Specialized Regression Models
Building Quantile Regression Models
What Is a Quantile?
How Does Quantile Regression Compare With Ordinary Least Squares Regression?
Fitting Fully Specified Quantile Regression Models
Example: Predicting Quantiles for Customer Lifetime Value
Example: Fitting a Quantile Process Model for Customer Lifetime Value
Introduction to the QUANTSELECT Procedure
Example: Building Quantile Regression Models for Close Rate
Example: Building a Quantile Process Model for Close Rate
Example: Ranking Store Performance with Conditional Distributions
Introduction to the QTRSELECT Procedure
Example: Building Quantile Regression Models for Close Rate
Summary
Building Logistic Regression Models
Comparison of Procedures for Logistic Regression
Basic Concepts of Binary Logistic Regression
Introduction to the HPLOGISTIC Procedure
Introduction to the LOGSELECT Procedure
Summary of Procedure Features
Building Generalized Linear Models
Procedures for Generalized Linear Models
Basic Concepts of Generalized Linear Models
Introduction to the HPGENSELECT Procedure
Introduction to the GENSELECT Procedure
Summary of Procedure Features
Building Generalized Additive Models
Procedures for Generalized Additive Models
Components of Generalized Additive Models
Introduction to the GAMPL Procedure
Introduction to the GAMMOD Procedure
Introduction to the GAMSELECT Procedure
Summary
Building Proportional Hazards Models
Concepts of Proportional Hazards Models
Introduction to the PHSELECT Procedure
Model Building with Discrete Time
Summary
Building Classification and Regression Trees
Introduction to the HPSPLIT Procedure
Introduction to the TREESPLIT Procedure
Summary
Building Adaptive Regression Models
Introduction to the ADAPTIVEREG Procedure
Summary
Appendices about Algorithms and Computational Methods
Algorithms for Least Squares Estimation
The QR Decomposition
The Singular Value Decomposition
The Sweep Algorithm
The Gram-Schmidt Procedure
Orthogonalization in Univariate Regression without an Intercept
Orthogonalization in Univariate Regression with an Intercept
Orthogonalization in Multiple Regression
Least Squares Geometry
Orthogonality of Predictions and Residuals
The Hat Matrix as a Projection Matrix
Akaike's Information Criterion
Forms of Akaike's Criterion
Motivation for Akaike's Criterion
Maximum Likelihood Estimation for Generalized Linear Models
Computational Algorithms
Existence of Maximum Likelihood Estimates
Approximate Computation of Information Criteria
Distributions for Generalized Linear Models
The Exponential Family
Continuous Distributions in the Exponential Family
Discrete Distributions in the Exponential Family
Distributions Outside of the Exponential Family
Spline Methods
Basic Terminology
The Knot Selection Problem
Types of Splines
Spline Functionality in Procedures
Algorithms for Generalized Additive Models
Additive Models
The Backfitting Algorithm for Additive Models
Local Scoring for Generalized Additive Models
The IWLS Algorithm for Generalized Linear Models
The Local Scoring Algorithm for Generalized Additive Models
Penalized Likelihood for Generalized Additive Models
Algorithms for Penalized Likelihood Estimation
Model Evaluation Criteria
Searching for the Optimal
Effective Degrees of Freedom
Methods for Selecting Generalized Additive Models
The Boosting Method
The Shrinkage Method
Appendices about Common Topics
Methods for Scoring Data
Types of Scoring Methods
Internal Scoring Methods
External Scoring Methods
Summary
Coding Schemes for Categorical Predictors
Why Does Parameterization Matter?
Specifying the Parameterization and Level Order
Specifying Effect Splitting
Useful Parameterizations
GLM Parameterization
Effect Parameterization
Reference Parameterization
Other Parameterizations
Summary
Essentials of ODS Graphics
Managing the Display of Graphs and Tables
Creating SAS Data Sets from Graphs and Tables
Accessing Individual Graphs
Specifying the Size and Resolution of Graphs
Specifying the Style of Graphs and Tables
Distinguishing Groups in Graphs
Modifying Graphs by Editing Graph Templates
Creating Graphs by Writing Graph Templates
Creating Graphs with Statistical Graphics Procedures
Modifying a Procedure Graph
Example: Enhancing a Contour Plot
Capturing the Data in the Procedure Plot
Determining the Template Name
Accessing and Displaying the Template
Modifying the Template Code
Creating an Annotation Data Set with City Information
Summary
Marginal Model Plots
Example: Claim Rates for Mortgages
The %Marginal Macro
Glossary
References
Subject Index
Syntax Index
Blank Page