Praise for the First Edition"This is a superb text from which to teach categorical data analysis, at a variety of levels. . . [t]his book can be very highly recommended."—Short Book Reviews"Of great interest to potential readers is the variety of fields that are represented in the examples: health care, financial, government, product marketing, and sports, to name a few."—Journal of Quality Technology"Alan Agresti has written another brilliant account of the analysis of categorical data."—The StatisticianThe use of statistical methods for categorical data is ever increasing in today's world. An Introduction to Categorical Data Analysis, Second Edition provides an applied introduction to the most important methods for analyzing categorical data. This new edition summarizes methods that have long played a prominent role in data analysis, such as chi-squared tests, and also places special emphasis on logistic regression and other modeling techniques for univariate and correlated multivariate categorical responses.This Second Edition features:Two new chapters on the methods for clustered data, with an emphasis on generalized estimating equations (GEE) and random effects modelsA unified perspective based on generalized linear modelsAn emphasis on logistic regression modelingAn appendix that demonstrates the use of SAS(r) for all methodsAn entertaining historical perspective on the development of the methodsSpecialized methods for ordinal data, small samples, multicategory data, and matched pairsMore than 100 analyses of real data sets and nearly 300 exercisesWritten in an applied, nontechnical style, the book illustrates methods using a wide variety of real data, including medical clinical trials, drug use by teenagers, basketball shooting, horseshoe crab mating, environmental opinions, correlates of happiness, and much more.An Introduction to Categorical Data Analysis, Second Edition is an invaluable tool for social, behavioral, and biomedical scientists, as well as researchers in public health, marketing, education, biological and agricultural sciences, and industrial quality control.
Author(s): Alan Agresti
Edition: 2
Publisher: Wiley-Interscience
Year: 2007
Language: English
Pages: 400
Tags: Математика;Теория вероятностей и математическая статистика;Математическая статистика;Прикладная математическая статистика;
An Introduction to Categorical Data Analysis......Page 4
Contents......Page 8
Preface to the Second Edition......Page 18
1.1 Categorical Response Data......Page 22
1.1.2 Nominal/Ordinal Scale Distinction......Page 23
1.2 Probability Distributions for Categorical Data......Page 24
1.2.1 Binomial Distribution......Page 25
1.2.2 Multinomial Distribution......Page 26
1.3.1 Likelihood Function and Maximum Likelihood Estimation......Page 27
1.3.3 Example: Survey Results on Legalizing Abortion......Page 29
1.3.4 Confidence Intervals for a Binomial Proportion......Page 30
1.4.1 Wald, Likelihood-Ratio, and Score Inference......Page 32
1.4.2 Wald, Score, and Likelihood-Ratio Inference for Binomial Parameter......Page 33
1.4.3 Small-Sample Binomial Inference......Page 34
1.4.4 Small-Sample Discrete Inference is Conservative......Page 35
1.4.5 Inference Based on the Mid P-value......Page 36
Problems......Page 37
2.1 Probability Structure for Contingency Tables......Page 42
2.1.2 Example: Belief in Afterlife......Page 43
2.1.3 Sensitivity and Specificity in Diagnostic Tests......Page 44
2.1.4 Independence......Page 45
2.2 Comparing Proportions in Two-by-Two Tables......Page 46
2.2.2 Example: Aspirin and Heart Attacks......Page 47
2.2.3 Relative Risk......Page 48
2.3 The Odds Ratio......Page 49
2.3.1 Properties of the Odds Ratio......Page 50
2.3.3 Inference for Odds Ratios and Log Odds Ratios......Page 51
2.3.5 The Odds Ratio Applies in Case–Control Studies......Page 53
2.4 Chi-Squared Tests of Independence......Page 55
2.4.1 Pearson Statistic and the Chi-Squared Distribution......Page 56
2.4.3 Tests of Independence......Page 57
2.4.4 Example: Gender Gap in Political Affiliation......Page 58
2.4.5 Residuals for Cells in a Contingency Table......Page 59
2.4.6 Partitioning Chi-Squared......Page 60
2.4.7 Comments About Chi-Squared Tests......Page 61
2.5.1 Linear Trend Alternative to Independence......Page 62
2.5.2 Example: Alcohol Use and Infant Malformation......Page 63
2.5.4 Choice of Scores......Page 64
2.5.5 Trend Tests for I × 2 and 2 × J Tables......Page 65
2.6.1 Fisher’s Exact Test for 2 × 2 Tables......Page 66
2.6.2 Example: Fisher’s Tea Taster......Page 67
2.6.3 P-values and Conservatism for Actual P(Type I Error)......Page 68
2.6.4 Small-Sample Confidence Interval for Odds Ratio......Page 69
2.7.2 Conditional Versus Marginal Associations: Death Penalty Example......Page 70
2.7.3 Simpson’s Paradox......Page 72
2.7.4 Conditional and Marginal Odds Ratios......Page 73
2.7.5 Conditional Independence Versus Marginal Independence......Page 74
2.7.6 Homogeneous Association......Page 75
Problems......Page 76
3. Generalized Linear Models......Page 86
3.1.3 Link Function......Page 87
3.1.4 Normal GLM......Page 88
3.2.1 Linear Probability Model......Page 89
3.2.2 Example: Snoring and Heart Disease......Page 90
3.2.3 Logistic Regression Model......Page 91
3.2.5 Binary Regression and Cumulative Distribution Functions......Page 93
3.3 Generalized Linear Models for Count Data......Page 95
3.3.2 Example: Female Horseshoe Crabs and their Satellites......Page 96
3.3.3 Overdispersion: Greater Variability than Expected......Page 101
3.3.4 Negative Binomial Regression......Page 102
3.3.5 Count Regression for Rate Data......Page 103
3.3.6 Example: British Train Accidents over Time......Page 104
3.4.1 Inference about Model Parameters......Page 105
3.4.3 The Deviance......Page 106
3.4.5 Residuals Comparing Observations to the Model Fit......Page 107
3.5.1 The Newton–Raphson Algorithm Fits GLMs......Page 109
3.5.2 Wald, Likelihood-Ratio, and Score Inference Use the Likelihood Function......Page 110
Problems......Page 111
4.1 Interpreting the Logistic Regression Model......Page 120
4.1.1 Linear Approximation Interpretations......Page 121
4.1.2 Horseshoe Crabs: Viewing and Smoothing a Binary Outcome......Page 122
4.1.3 Horseshoe Crabs: Interpreting the Logistic Regression Fit......Page 123
4.1.4 Odds Ratio Interpretation......Page 125
4.1.6 Normally Distributed X Implies Logistic Regression for Y......Page 126
4.2.2 Confidence Intervals for Effects......Page 127
4.2.3 Significance Testing......Page 128
4.2.6 Confidence Intervals for Probabilities: Details......Page 129
4.2.7 Standard Errors of Model Parameter Estimates......Page 130
4.3.1 Indicator Variables Represent Categories of Predictors......Page 131
4.3.2 Example: AZT Use and AIDS......Page 132
4.3.3 ANOVA-Type Model Representation of Factors......Page 134
4.3.4 The Cochran–Mantel–Haenszel Test for 2 × 2 × K Contingency Tables......Page 135
4.4 Multiple Logistic Regression......Page 136
4.4.1 Example: Horseshoe Crabs with Color and Width Predictors......Page 137
4.4.3 Quantitative Treatment of Ordinal Predictor......Page 139
4.4.4 Allowing Interaction......Page 140
4.5.1 Probability-Based Interpretations......Page 141
Problems......Page 142
5.1 Strategies in Model Selection......Page 158
5.1.2 Example: Horseshoe Crabs Revisited......Page 159
5.1.3 Stepwise Variable Selection Algorithms......Page 160
5.1.4 Example: Backward Elimination for Horseshoe Crabs......Page 161
5.1.5 AIC, Model Selection, and the “Correct” Model......Page 162
5.1.6 Summarizing Predictive Power: Classification Tables......Page 163
5.1.7 Summarizing Predictive Power: ROC Curves......Page 164
5.2.1 Likelihood-Ratio Model Comparison Tests......Page 165
5.2.2 Goodness of Fit and the Deviance......Page 166
5.2.3 Checking Fit: Grouped Data, Ungrouped Data, and Continuous Predictors......Page 167
5.2.4 Residuals for Logit Models......Page 168
5.2.5 Example: Graduate Admissions at University of Florida......Page 170
5.2.6 Influence Diagnostics for Logistic Regression......Page 171
5.2.7 Example: Heart Disease and Blood Pressure......Page 172
5.3.1 Infinite Effect Estimate: Quantitative Predictor......Page 173
5.3.2 Infinite Effect Estimate: Categorical Predictors......Page 174
5.3.3 Example: Clinical Trial with Sparse Data......Page 175
5.3.4 Effect of Small Samples on X(2) and G(2) Tests......Page 177
5.4.1 Conditional Maximum Likelihood Inference......Page 178
5.4.2 Small-Sample Tests for Contingency Tables......Page 179
5.4.4 Small-Sample Confidence Intervals for Logistic Parameters and Odds Ratios......Page 180
5.5 Sample Size and Power for Logistic Regression......Page 181
5.5.2 Sample Size in Logistic Regression......Page 182
5.5.3 Sample Size in Multiple Logistic Regression......Page 183
Problems......Page 184
6.1.1 Baseline-Category Logits......Page 194
6.1.2 Example: Alligator Food Choice......Page 195
6.1.3 Estimating Response Probabilities......Page 197
6.1.4 Example: Belief in Afterlife......Page 199
6.1.5 Discrete Choice Models......Page 200
6.2.1 Cumulative Logit Models with Proportional Odds Property......Page 201
6.2.2 Example: Political Ideology and Party Affiliation......Page 203
6.2.4 Checking Model Fit......Page 205
6.2.5 Example: Modeling Mental Health......Page 206
6.2.7 Latent Variable Motivation......Page 208
6.3 Paired-Category Ordinal Logits......Page 210
6.3.2 Example: Political Ideology Revisited......Page 211
6.3.4 Example: A Developmental Toxicity Study......Page 212
6.3.5 Overdispersion in Clustered Data......Page 213
6.4.1 Example: Job Satisfaction and Income......Page 214
6.4.2 Generalized Cochran–Mantel–Haenszel Tests......Page 215
6.4.3 Detecting Nominal–Ordinal Conditional Association......Page 216
Problems......Page 217
7.1 Loglinear Models for Two-Way and Three-Way Tables......Page 225
7.1.2 Interpretation of Parameters in Independence Model......Page 226
7.1.3 Saturated Model for Two-Way Tables......Page 227
7.1.4 Loglinear Models for Three-Way Tables......Page 229
7.1.6 Example: Alcohol, Cigarette, and Marijuana Use......Page 230
7.2.1 Chi-Squared Goodness-of-Fit Tests......Page 233
7.2.2 Loglinear Cell Residuals......Page 234
7.2.4 Confidence Intervals for Conditional Odds Ratios......Page 235
7.2.6 Example: Automobile Accidents and Seat Belts......Page 236
7.2.8 Large Samples and Statistical vs Practical Significance......Page 239
7.3.1 Using Logistic Models to Interpret Loglinear Models......Page 240
7.3.2 Example: Auto Accident Data Revisited......Page 241
7.3.4 Strategies in Model Selection......Page 242
7.4.1 Independence Graphs......Page 244
7.4.2 Collapsibility Conditions for Three-Way Tables......Page 245
7.4.4 Collapsibility and Independence Graphs for Multiway Tables......Page 246
7.4.5 Example: Model Building for Student Drug Use......Page 247
7.5 Modeling Ordinal Associations......Page 249
7.5.1 Linear-by-Linear Association Model......Page 250
7.5.2 Example: Sex Opinions......Page 251
Problems......Page 253
8. Models for Matched Pairs......Page 265
8.1.1 McNemar Test Comparing Marginal Proportions......Page 266
8.1.2 Estimating Differences of Proportions......Page 267
8.2.1 Marginal Models for Marginal Proportions......Page 268
8.2.2 Subject-Specific and Population-Averaged Tables......Page 269
8.2.3 Conditional Logistic Regression for Matched-Pairs......Page 270
8.2.4 Logistic Regression for Matched Case–Control Studies......Page 271
8.3 Comparing Margins of Square Contingency Tables......Page 273
8.3.2 Example: Coffee Brand Market Share......Page 274
8.3.3 Marginal Homogeneity and Ordered Categories......Page 275
8.3.4 Example: Recycle or Drive Less to Help Environment?......Page 276
8.4 Symmetry and Quasi-Symmetry Models for Square Tables......Page 277
8.4.3 Example: Coffee Brand Market Share Revisited......Page 278
8.4.5 An Ordinal Quasi-Symmetry Model......Page 279
8.4.7 Testing Marginal Homogeneity Using Symmetry and Ordinal Quasi-Symmetry......Page 280
8.5 Analyzing Rater Agreement......Page 281
8.5.2 Quasi-independence Model......Page 282
8.5.3 Odds Ratios Summarizing Agreement......Page 283
8.5.4 Quasi-Symmetry and Agreement Modeling......Page 284
8.6 Bradley–Terry Model for Paired Preferences......Page 285
8.6.2 Example: Ranking Men Tennis Players......Page 286
Problems......Page 287
9. Modeling Correlated, Clustered Responses......Page 297
9.1.2 Example: Longitudinal Study of Treatments for Depression......Page 298
9.2 Marginal Modeling: The GEE Approach......Page 300
9.2.2 Generalized Estimating Equation Methodology: Basic Ideas......Page 301
9.2.3 GEE for Binary Data: Depression Study......Page 302
9.2.4 Example: Teratology Overdispersion......Page 304
9.2.5 Limitations of GEE Compared with ML......Page 305
9.3.2 Example: Insomnia Study......Page 306
9.3.4 Dealing with Missing Data......Page 308
9.4.2 Example: Respiratory Illness and Maternal Smoking......Page 309
9.4.3 Comparisons that Control for Initial Response......Page 310
Problems......Page 311
10.1 Random Effects Modeling of Clustered Categorical Data......Page 318
10.1.1 The Generalized Linear Mixed Model......Page 319
10.1.2 A Logistic GLMM for Binary Matched Pairs......Page 320
10.1.4 Differing Effects in Conditional Models and Marginal Models......Page 321
10.2.1 Small-Area Estimation of Binomial Probabilities......Page 323
10.2.2 Example: Estimating Basketball Free Throw Success......Page 324
10.2.3 Example: Teratology Overdispersion Revisited......Page 325
10.2.4 Example: Repeated Responses on Similar Survey Items......Page 326
10.2.6 Example: Depression Study Revisited......Page 328
10.2.7 Choosing Marginal or Conditional Models......Page 329
10.2.8 Conditional Models: Random Effects Versus Conditional ML......Page 330
10.3.1 Example: Insomnia Study Revisited......Page 331
10.3.2 Bivariate Random Effects and Association Heterogeneity......Page 332
10.4 Multilevel (Hierarchical) Models......Page 334
10.4.1 Example: Two-Level Model for Student Advancement......Page 335
10.4.2 Example: Grade Retention......Page 336
10.5.1 Fitting GLMMs......Page 337
10.5.2 Inference for Model Parameters and Prediction......Page 338
Problems......Page 339
11.1 The Pearson–Yule Association Controversy......Page 346
11.2 R. A. Fisher’s Contributions......Page 347
11.3 Logistic Regression......Page 349
11.4 Multiway Contingency Tables and Loglinear Models......Page 350
11.5 Final Comments......Page 352
Appendix A: Software for Categorical Data Analysis......Page 353
Appendix B: Chi-Squared Distribution Values......Page 364
Bibliography......Page 365
Index of Examples......Page 367
Subject Index......Page 371
Brief Solutions to Some Odd-Numbered Problems......Page 378
Corrections......Page 394