Modern Applied Regressions creates an intricate and colorful mural with mosaics of categorical and limited response variable (CLRV) models using both Bayesian and Frequentist approaches. Written for graduate students, junior researchers, and quantitative analysts in behavioral, health, and social sciences, this text provides details for doing Bayesian and frequentist data analysis of CLRV models. Each chapter can be read and studied separately with R coding snippets and template interpretation for easy replication. Along with the doing part, the text provides basic and accessible statistical theories behind these models and uses a narrative style to recount their origins and evolution.
This book first scaffolds both Bayesian and frequentist paradigms for regression analysis, and then moves onto different types of categorical and limited response variable models, including binary, ordered, multinomial, count, and survival regression. Each of the middle four chapters discusses a major type of CLRV regression that subsumes an array of important variants and extensions. The discussion of all major types usually begins with the history and evolution of the prototypical model, followed by the formulation of basic statistical properties and an elaboration on the doing part of the model and its extension. The doing part typically includes R codes, results, and their interpretation. The last chapter discusses advanced modeling and predictive techniques―multilevel modeling, causal inference and propensity score analysis, and machine learning―that are largely built with the toolkits designed for the CLRV models previously covered.
The online resources for this book, including R and Stan codes and supplementary
notes, can be accessed at https://sites.google.com/site/socjunxu/home/statistics/modernapplied-
regressions.
Author(s): Jun Xu
Series: Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences
Publisher: CRC Press/Chapman & Hall
Year: 2022
Language: English
Pages: 297
City: Boca Raton
Cover
Half Title
Series Page
Title Page
Copyright Page
Dedication
Contents
Preface
1. Introduction
1.1. Categorical and Limited Response Variables
1.1.1. A Brief History of CLRV Models
1.1.2. Overview of CLRVs
1.2. Approaches to Regression Analysis
1.2.1. Frequentist Approach to Regression Modeling
1.2.2. Bayesian Approach to Regression Modeling
1.2.2.1. The Example of COVID-19
1.2.3. Priors
1.2.3.1. Conjugate Priors
1.2.3.2. Informative, Non-informative, and Other Priors
1.2.4. Markov Chain Monte Carlo (MCMC)
1.3. Introduction to R
1.3.1. RStudio
1.3.2. Use R as Calculator
1.3.3. Set Up Working Directory
1.3.4. Open Log File
1.3.5. Load Data
1.3.6. Subset Data
1.3.7. Examine Data
1.3.8. Examine Individual Variables
1.3.9. Save Graphs
1.3.10. Add Comments
1.3.11. Create Dummy Variables and Check Transformation
1.3.12. Label Variables
1.3.13. Label Values
1.3.14. Create Ordinal Variables
1.3.15. Check Transformation
1.3.16. Drop Missing Cases
1.3.17. Graph Matrix
1.3.18. Save Data
1.3.19. Close Log
1.3.20. Source Codes
1.4. Review of Linear Regression Models
1.4.1. A Brief History of OLS Rregression
1.4.2. Main Results of OLS Regression
1.4.2.1. OLS Estimator and Variance-Covariance Matrix
1.4.3. Major Assumptions of OLS Regression
1.4.3.1. Zero Conditional Mean and Linearity
1.4.3.2. Spherical Disturbance
1.4.3.3. Identifiability
1.4.3.4. Nonstochastic Covariates
1.4.3.5. Normality
1.4.4. Estimation and Interpretation
1.4.5. A Brief Introduction to Stan and Other BUGS-like Software
1.4.6. Bayesian Approach to Linear Regression
2. Binary Regression
2.1. Introduction
2.1.1. A Brief History of Binary Regression
2.1.2. Linear Probability Regression
2.2. Maximum Likelihood Estimation
2.2.1. Simple MLE Examples
2.2.2. MLE for Binary Regression
2.2.3. Numerical Methods for MLE
2.2.4. Normality, Consistency, and Efficiency
2.2.5. Nonlinear Probability
2.3. Hypothesis Testing and Model Comparisons
2.3.1. Wald, Likelihood Ratio, and Score Tests
2.3.1.1. Graphical Comparison of Wald, LR, and Score Tests
2.3.2. Scalar Measures
2.3.3. ROC Curve
2.3.4. Goodness of Fit Measures: The Hosmer-Lemeshow Test
2.3.5. Limitations of NHST
2.4. Interpretation of Results
2.4.1. Precision Estimates
2.4.1.1. End-Point Transformation
2.4.1.2. Delta Method
2.4.1.3. Re-sampling Methods
2.4.2. Interpretation Based on Predictions
2.4.3. Interpretations Based on Effects
2.4.3.1. Odds Ratios
2.4.3.2. Discrete Rates of Change in Prediction
2.4.3.3. Marginal Effects
2.4.4. Group Comparisons
2.5. Bayesian Binary Regression
2.5.1. Priors for Binary Regression
2.5.2. Bayesian Estimation of Binary Regression
2.5.3. Bayesian Post-estimation Analysis
2.5.4. Bayesian Assessment of Null Values
3. Polytomous Regression
3.1. Ordered Regression
3.1.1. Types of Ordinal Measures and Regression Models
3.1.2. A Brief History of Ordered Regression Models
3.1.3. Cumulative Regression
3.1.3.1. Model Setup and Estimation
3.1.3.2. Hypothesis Testing and Model Comparison
3.1.3.3. Interpretation
3.1.4. Testing the Proportional Odds/Parallel Lines Assumption
3.1.5. Partial, Proportional Constraint, and Non-parallel Models
3.1.6. Continuation Ratio Regression
3.1.7. Adjacent Category Regression
3.1.8. Stereotype Logit
3.2. Extentions to Classical Ordered Regression Models
3.2.1. Inflated Ordered Regression
3.2.2. Heterogeneous Choice Models
3.2.3. General Guidelines for Model Selection
3.3. Multinomial Regression
3.3.1. Multinomial Logit Regression
3.3.2. Multinomial Probit Regression
3.4. Bayesian Polytomous Regression
3.4.1. Bayesian Estimation
3.4.1.1. Bayesian Parallel Cumulative Ordered Regression
3.4.1.2. Bayesian Non-Parallel Cumulative Ordered Regression
3.4.1.3. Bayesian Stereotype Logit Model
4. Count Regression
4.1. Poisson Distribution
4.2. Basic Count Regression Models
4.2.1. Explore the Count Response Variable
4.2.2. Plot Observed vs. Predicted Count Proportions
4.2.3. Poisson Regression
4.2.4. Contagion, Heterogeneity, and Over-Dispersion
4.2.5. Quasi-Poisson Regression
4.2.6. Negative Binomial Regression
4.3. Zero-Modified Count Regression
4.3.1. Zero-Truncated Models
4.3.2. Hurdle Models
4.3.3. Zero-Inflated Models
4.4. Bayesian Estimation of Count Regression
4.4.1. Bayesian Estimation of Negative Binomial Regression
4.4.2. Bayesian Estimation of Zero-Inflated Poisson Regression
5. Survival Regression
5.1. Introduction
5.1.1. Censoring and Truncation
5.2. Basic Concepts
5.2.1. Time and Survival Function
5.2.2. Hazard Function
5.3. Descriptive Survival Analysis
5.3.1. The Kaplan-Meier Estimator
5.3.2. The Log-Rank Test
5.4. Accelerated Failure Time Model
5.4.1. Exponential AFT Regression
5.4.2. Weibull AFT Regression
5.5. Parametric Proportional Hazard Regression
5.5.1. Exponential PH Regression
5.5.2. Weibull PH Regression
5.6. Cox Regression
5.7. Testing the PH Assumption
5.8. Bayesian Approaches to Survival Regression
5.8.1. Bayesian Estimation of Weibull PH Model Using rstan
5.8.2. Bayesian Estimation of Survival Models Using spBayesSurv
6. Extensions
6.1. Multilevel Regression
6.1.1. Multilevel Logit Regression
6.1.2. Multilevel Count Regression
6.1.3. Bayesian Multilevel Regression
6.2. Causal Inference
6.2.1. Average Treatment Effects
6.2.1.1. Average Treatment Effects
6.2.1.2. Average Treatment Effects for the Treated
6.2.1.3. (Strong) Ignorability of Treatment Assumption
6.2.2. Propensity Score Analysis
6.2.2.1. Propensity Score Matching
6.2.2.2. Mahalanobis Distance Matching
6.2.2.3. Genetic Matching
6.2.2.4. Coarsened (Exact) Matching
6.3. Machine Learning
6.3.1. Basic Concepts
6.3.1.1. Machine Learning and Statistical Learning
6.3.1.2. Supervised Learning and Unsupervised Learning
6.3.1.3. Regression and Classification
6.3.1.4. Training, Validation, and Test
6.3.2. Supervised Learning
6.3.2.1. Regularization: Ridge and Lasso Regression
6.3.2.2. Penalized Binary Logit
6.3.2.3. Decision/Regression Trees
6.3.2.4. Bagging and Random Forests
6.3.2.5. Classification Trees
Bibliography
Index