An up-to-date, comprehensive treatment of a classic text on missing data in statistics
The topic of missing data has gained considerable attention in recent decades. This new edition by two acknowledged experts on the subject offers an up-to-date account of practical methodology for handling missing data problems. Blending theory and application, authors Roderick Little and Donald Rubin review historical approaches to the subject and describe simple methods for multivariate analysis with missing values. They then provide a coherent theory for analysis of problems based on likelihoods derived from statistical models for the data and the missing data mechanism, and then they apply the theory to a wide range of important missing data problems.
Statistical Analysis with Missing Data, Third Edition starts by introducing readers to the subject and approaches toward solving it. It looks at the patterns and mechanisms that create the missing data, as well as a taxonomy of missing data. It then goes on to examine missing data in experiments, before discussing complete-case and available-case analysis, including weighting methods. The new edition expands its coverage to include recent work on topics such as nonresponse in sample surveys, causal inference, diagnostic methods, and sensitivity analysis, among a host of other topics.
• An updated “classic” written by renowned authorities on the subject
• Features over 150 exercises (including many new ones)
• Covers recent work on important methods like multiple imputation, robust alternatives to weighting, and Bayesian methods
• Revises previous topics based on past student feedback and class experience
• Contains an updated and expanded bibliography
The authors were awarded The Karl Pearson Prize in 2017 by the International Statistical Institute, for a research contribution that has had profound influence on statistical theory, methodology or applications. Their work "has been no less than defining and transforming." (ISI)
Statistical Analysis with Missing Data, Third Edition is an ideal textbook for upper undergraduate and/or beginning graduate level students of the subject. It is also an excellent source of information for applied statisticians and practitioners in government and industry.
Author(s): Roderick J. A. Little, Donald B. Rubin
Series: Wiley Series in Probability and Statistics
Edition: 3
Publisher: Wiley
Year: 2019
Language: English
Commentary: Vector PDF
Pages: 462
City: Hoboken, NJ
Tags: Data Analysis; Bayesian Inference; Statistics; Maximum Likelihood Estimation; Probability Theory; Analysis of Variance; Missing Data; Data Imputation
Statistical Analysis with Missing Data
Contents
Preface to the Third Edition
Part I Overview and Basic Approaches
1 Introduction
1.1 The Problem of Missing Data
1.2 Missingness Patterns and Mechanisms
1.3 Mechanisms That Lead to Missing Data
1.4 A Taxonomy of Missing Data Methods
Problems
Note
2 Missing Data in Experiments
2.1 Introduction
2.2 The Exact Least Squares Solution with Complete Data
2.3 The Correct Least Squares Analysis with Missing Data
2.4 Filling in Least Squares Estimates
2.4.1 Yatess Method
2.4.2 Using a Formula for the Missing Values
2.4.3 Iterating to Find the Missing Values
2.4.4 ANCOVA with Missing Value Covariates
2.5 Bartletts ANCOVA Method
2.5.1 Useful Properties of Bartletts Method
2.5.2 Notation
2.5.3 The ANCOVA Estimates of Parameters and Missing Y-Values
2.5.4 ANCOVA Estimates of the Residual Sums of Squares and the Covariance Matrix of ?
2.6 Least Squares Estimates of Missing Values by ANCOVA Using Only Complete-Data Methods
2.7 Correct Least Squares Estimates of Standard Errors and One Degree of Freedom Sums of Squares
2.8 Correct Least-Squares Sums of Squares with More Than One Degree of Freedom
Problems
3 Complete-Case and Available-Case Analysis, Including Weighting Methods
3.1 Introduction
3.2 Complete-Case Analysis
3.3 Weighted Complete-Case Analysis
3.3.1 Weighting Adjustments
3.3.2 Poststratification and Raking to Known Margins
3.3.3 Inference from Weighted Data
3.3.4 Summary of Weighting Methods
3.4 Available-Case Analysis
Problems
4 Single Imputation Methods
4.1 Introduction
4.2 Imputing Means from a Predictive Distribution
4.2.1 Unconditional Mean Imputation
4.2.2 Conditional Mean Imputation
4.3 Imputing Draws from a Predictive Distribution
4.3.1 Draws Based on Explicit Models
4.3.2 Draws Based on Implicit Models–Hot Deck Methods
4.4 Conclusion
Problems
5 Accounting for Uncertainty from Missing Data
5.1 Introduction
5.2 Imputation Methods that Provide Valid Standard Errors from a Single Filled-in Data Set
5.3 Standard Errors for Imputed Data by Resampling
5.3.1 Bootstrap Standard Errors
5.3.2 Jackknife Standard Errors
5.4 Introduction to Multiple Imputation
5.5 Comparison of Resampling Methods and Multiple Imputation
Problems
Part II Likelihood-Based Approaches to the Analysis of Data with Missing Values
6 Theory of Inference Based on the Likelihood Function
6.1 Review of Likelihood-Based Estimation for Complete Data
6.1.1 Maximum Likelihood Estimation
6.1.2 Inference Based on the Likelihood
6.1.3 Large Sample Maximum Likelihood and Bayes Inference
6.1.4 Bayes Inference Based on the Full Posterior Distribution
6.1.5 Simulating Posterior Distributions
6.2 Likelihood-Based Inference with Incomplete Data
6.3 A Generally Flawed Alternative to Maximum Likelihood: Maximizing over the Parameters and the Missing Data
6.3.1 The Method
6.3.2 Background
6.3.3 Examples
6.4 Likelihood Theory for Coarsened Data
Problems
Notes
7 Factored Likelihood Methods When the Missingness Mechanism Is Ignorable
7.1 Introduction
7.2 Bivariate Normal Data with One Variable Subject to Missingness: ML Estimation
7.2.1 ML Estimates
7.2.2 Large-Sample Covariance Matrix
7.3 Bivariate Normal Monotone Data: Small-Sample Inference
7.4 Monotone Missingness with More Than Two Variables
7.4.1 Multivariate Data with One Normal Variable Subject to Missingness
7.4.2 The Factored Likelihood for a General Monotone Pattern
7.4.3 ML Computation for Monotone Normal Data via the Sweep Operator
7.4.4 Bayes Computation for Monotone Normal Data via the Sweep Operator
7.5 Factored Likelihoods for Special Nonmonotone Patterns
Problems
8 Maximum Likelihood for General Patterns of Missing Data: Introduction and Theory with Ignorable Nonresponse
8.1 Alternative Computational Strategies
8.2 Introduction to the EM Algorithm
8.3 The E Step and The M Step of EM
8.4 Theory of the EM Algorithm
8.4.1 Convergence Properties of EM
8.4.2 EM for Exponential Families
8.4.3 Rate of Convergence of EM
8.5 Extensions of EM
8.5.1 The ECM Algorithm
8.5.2 The ECME and AECM Algorithms
8.5.3 The PX-EM Algorithm
8.6 Hybrid Maximization Methods
Problems
9 Large-Sample Inference Based on Maximum Likelihood Estimates
9.1 Standard Errors Based on The Information Matrix
9.2 Standard Errors via Other Methods
9.2.1 The Supplemented EM Algorithm
9.2.2 Bootstrapping the Observed Data
9.2.3 Other Large-Sample Methods
9.2.4 Posterior Standard Errors from Bayesian Methods
Problems
10 Bayes and Multiple Imputation
10.1 Bayesian Iterative Simulation Methods
10.1.1 Data Augmentation
10.1.2 The Gibbs Sampler
10.1.3 Assessing Convergence of Iterative Simulations
10.1.4 Some Other Simulation Methods
10.2 Multiple Imputation
10.2.1 Large-Sample Bayesian Approximations of the Posterior Mean and Variance Based on a Small Number of Draws
10.2.2 Approximations Using Test Statistics or p-Values
10.2.3 Other Methods for Creating Multiple Imputations
10.2.4 Chained-Equation Multiple Imputation
10.2.5 Using Different Models for Imputation and Analysis
Problems
Notes
Part III Likelihood-Based Approaches to the Analysis of Incomplete Data: Some Examples
11 Multivariate Normal Examples, Ignoring the Missingness Mechanism
11.1 Introduction
11.2 Inference for a Mean Vector and Covariance Matrix with Missing Data Under Normality
11.2.1 The EM Algorithm for Incomplete Multivariate Normal Samples
11.2.2 Estimated Asymptotic Covariance Matrix of (? − ̂?)
11.2.3 Bayes Inference and Multiple Imputation for the Normal Model
11.3 The Normal Model with a Restricted Covariance Matrix
11.4 Multiple Linear Regression
11.4.1 Linear Regression with Missingness Confined to the Dependent Variable
11.4.2 More General Linear Regression Problems with Missing Data
11.5 A General Repeated-Measures Model with Missing Data
11.6 Time Series Models
11.6.1 Introduction
11.6.2 Autoregressive Models for Univariate Time Series with Missing Values
11.6.3 Kalman Filter Models
11.7 Measurement Error Formulated as Missing Data
Problems
12 Models for Robust Estimation
12.1 Introduction
12.2 Reducing the Influence of Outliers by Replacing the Normal Distribution by a Longer-Tailed Distribution
12.2.1 Estimation for a Univariate Sample
12.2.2 Robust Estimation of the Mean and Covariance Matrix with Complete Data
12.2.3 Robust Estimation of the Mean and Covariance Matrix from Data with Missing Values
12.2.4 Adaptive Robust Multivariate Estimation
12.2.5 Bayes Inference for the t Model
12.2.6 Further Extensions of the t Model
12.3 Penalized Spline of Propensity Prediction
Problems
Note
13 Models for Partially Classified Contingency Tables, Ignoring the Missingness Mechanism
13.1 Introduction
13.2 Factored Likelihoods for Monotone Multinomial Data
13.2.1 Introduction
13.2.2 ML and Bayes for Monotone Patterns
13.2.3 Precision of Estimation
13.3 ML and Bayes Estimation for Multinomial Samples with General Patterns of Missingness
13.4 Loglinear Models for Partially Classified Contingency Tables
13.4.1 The Complete-Data Case
13.4.2 Loglinear Models for Partially Classified Tables
13.4.3 Goodness-of-Fit Tests for Partially Classified Data
Problems
14 Mixed Normal and Nonnormal Data with Missing Values, Ignoring the Missingness Mechanism
14.1 Introduction
14.2 The General Location Model
14.2.1 The Complete-Data Model and Parameter Estimates
14.2.2 ML Estimation with Missing Values
14.2.3 Details of the E Step Calculations
14.2.4 Bayes Computation for the Unrestricted General Location Model
14.3 The General Location Model with Parameter Constraints
14.3.1 Introduction
14.3.2 Restricted Models for the Cell Means
14.3.3 Loglinear Models for the Cell Probabilities
14.3.4 Modifications to the Algorithms of Previous Sections to Accommodate Parameter Restrictions
14.3.5 Simplifications When Categorical Variables are More Observed than Continuous Variables
14.4 Regression Problems Involving Mixtures of Continuous and Categorical Variables
14.4.1 Normal Linear Regression with Missing Continuous or Categorical Covariates
14.4.2 Logistic Regression with Missing Continuous or Categorical Covariates
14.5 Further Extensions of the General Location Model
Problems
15 Missing Not at Random Models
15.1 Introduction
15.2 Models with Known MNAR Missingness Mechanisms: Grouped and Rounded Data
15.3 Normal Models for MNAR Missing Data
15.3.1 Normal Selection and Pattern-Mixture Models for Univariate Missingness
15.3.2 Following up a Subsample of Nonrespondents
15.3.3 The Bayesian Approach
15.3.4 Imposing Restrictions on Model Parameters
15.3.5 Sensitivity Analysis
15.3.6 Subsample Ignorable Likelihood for Regression with Missing Data
15.4 Other Models and Methods for MNAR Missing Data
15.4.1 MNAR Models for Repeated-Measures Data
15.4.2 MNAR Models for Categorical Data
15.4.3 Sensitivity Analyses for Chained-Equation Multiple Imputations
15.4.4 Sensitivity Analyses in Pharmaceutical Applications
Problems
References
Author Index
Subject Index
EULA