Statistical Modeling for Biomedical Researchers: A Simple Introduction to the Analysis of Complex Data

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

For biomedical researchers, the new edition of this standard text guides readers in the selection and use of advanced statistical methods and the presentation of results to clinical colleagues. It assumes no knowledge of mathematics beyond high school level and is accessible to anyone with an introductory background in statistics. The Stata statistical software package is used to perform the analyses, in this edition employing the intuitive version 10. Topics covered include linear, logistic and Poisson regression, survival analysis, fixed-effects analysis of variance, and repeated-measure analysis of variance. Restricted cubic splines are used to model non-linear relationships. Each method is introduced in its simplest form and then extended to cover more complex situations. An appendix will help the reader select the most appropriate statistical methods for their data. The text makes extensive use of real data sets available online through Vanderbilt University.

Author(s): William D. Dupont
Series: Cambridge Medicine
Edition: 2
Publisher: Cambridge University Press
Year: 2009

Language: English
Commentary: 41674
Pages: 544
Tags: Медицинские дисциплины;Социальная медицина и медико-биологическая статистика;

Cover......Page 1
Half-title......Page 3
Title......Page 5
Copyright......Page 6
Contents......Page 7
Preface......Page 19
Changes in the second edition......Page 20
Acknowledgements......Page 21
1.1. Algebraic notation......Page 23
1.2.1. Dot plot......Page 25
1.2.4. Sample variance......Page 26
1.2.7. Box plot......Page 27
1.2.9. Scatter plot......Page 28
1.3. The Stata Statistical Software Package......Page 29
1.3.1. Downloading data from my website......Page 30
1.3.2. Creating histograms with Stata......Page 31
1.3.3. Stata command syntax......Page 34
1.3.4. Obtaining interactive help from Stata......Page 35
1.3.6. Stata graphics and schemes......Page 36
1.3.8. Stata pulldown menus......Page 37
1.3.9. Displaying other descriptive statistics with Stata......Page 42
1.4. Inferential statistics......Page 44
1.4.1. Probability density function......Page 45
1.4.3. Normal distribution......Page 46
1.4.5. Standard error......Page 47
1.4.6. Null hypothesis, alternative hypothesis, and P-value......Page 48
1.4.8. Statistical power......Page 49
1.4.9. The z and Student’s t distributions......Page 51
1.4.10. Paired t test......Page 52
1.4.11. Performing paired t tests with Stata......Page 53
1.4.12. Independent t test using a pooled standard error estimate......Page 56
1.4.13. Independent t test using separate standard error estimates......Page 57
1.4.14. Independent t tests using Stata......Page 58
1.4.15. The chi-squared distribution......Page 60
1.5. Overview of methods discussed in this text......Page 61
1.5.1. Models with one response per patient......Page 62
1.6. Additional reading......Page 63
1.7. Exercises......Page 64
2.1. Sample covariance......Page 67
2.3. Population covariance and correlation coefficient......Page 69
2.4. Conditional expectation......Page 70
2.5. Simple linear regression model......Page 71
2.6. Fitting the linear regression model......Page 72
2.7. Historical trivia: origin of the term......Page 74
2.8. Determining the accuracy of linear regression estimates......Page 75
2.9. Ethylene glycol poisoning example......Page 76
2.10. 95% confidence interval for y [x] = alpha+betax evaluated at x......Page 77
2.11. 95% prediction interval for the response of a new patient......Page 78
2.12. Simple linear regression with Stata......Page 79
2.13. Lowess regression......Page 86
2.14. Plotting a lowess regression curve in Stata......Page 87
2.15. Residual analyses......Page 88
2.16. Studentized residual analysis using Stata......Page 91
2.17.1. Stabilizing the variance......Page 92
2.17.2. Correcting for non-linearity......Page 93
2.17.3. Example: research funding and morbidity for 29 diseases......Page 94
2.18. Analyzing transformed data with Stata......Page 96
2.19. Testing the equality of regression slopes......Page 101
2.19.1. Example: the Framingham Heart Study......Page 103
2.20. Comparing slope estimates with Stata......Page 104
2.21. Density-distribution sunflower plots......Page 109
2.22. Creating density-distribution sunflower plots with Stata......Page 110
2.23. Additional reading......Page 114
2.24. Exercises......Page 115
3.1. The model......Page 119
3.2. Confounding variables......Page 120
3.5. Expected response in the multiple regression model......Page 121
3.6. The accuracy of multiple regression parameter estimates......Page 122
3.8. Leverage......Page 123
3.11. Example: the Framingham Heart Study......Page 124
3.11.1. Preliminary univariate analyses......Page 125
3.12.1. Producing Scatter Plot Matrix Graphs with Stata......Page 127
3.13.1. The Framingham example......Page 129
3.14. Multiple regression modeling of the Framingham data......Page 131
3.15.1. The Framingham example......Page 132
3.17. Multiple linear regression with Stata......Page 136
3.18. Automatic methods of model selection......Page 141
3.18.1. Forward selection using Stata......Page 142
3.18.2. Backward selection......Page 144
3.18.4. Backward stepwise selection......Page 145
3.19. Collinearity......Page 146
3.20. Residual analyses......Page 147
3.21. Influence......Page 148
3.21.2. Cook’s distance......Page 149
3.21.3. The Framingham example......Page 150
3.22. Residual and influence analyses using Stata......Page 151
3.23. Using multiple linear regression for non-linear models......Page 155
3.24. Building non-linear models with restricted cubic splines......Page 156
3.24.1. Choosing the knots for a restricted cubic spline model......Page 159
3.25.1. Modeling length-of-stay and MAP using restricted cubic splines......Page 160
3.25.2. Using Stata for non-linear models with restricted cubic splines......Page 164
3.26. Additional reading......Page 176
3.27. Exercises......Page 177
4.2. Sigmoidal family of logistic regression curves......Page 181
4.3. The log odds of death given a logistic probability function......Page 183
4.4. The binomial distribution......Page 184
4.6. Generalized linear model......Page 185
4.8. Maximum likelihood estimation......Page 186
4.8.1. Variance of maximum likelihood parameter estimates......Page 187
4.9.1. Likelihood ratio tests......Page 188
4.9.2. Quadratic approximations to the log likelihood ratio function......Page 189
4.9.4. Wald tests and confidence intervals......Page 190
4.9.5. Which test should you use?......Page 191
4.10. Sepsis example......Page 192
4.11. Logistic regression with Stata......Page 193
4.12. Odds ratios and the logistic regression model......Page 196
4.13.1. Calculating this odds ratio with Stata......Page 197
4.15. 95% confidence interval for Pi[x]......Page 198
4.16. Exact 100(1–alpha)% confidence intervals for proportions......Page 199
4.17. Example: the Ibuprofen in Sepsis Study......Page 200
4.18. Logistic regression with grouped data using Stata......Page 203
4.19.1. Example: the Ille-et-Vilaine study of esophageal cancer and alcohol......Page 209
4.19.2. Review of classical case-control theory......Page 210
4.19.3. 95% confidence interval for the odds ratio: Woolf’s method......Page 211
4.19.5. Test of the null hypothesis that two proportions are equal......Page 212
4.20.2. 95% confidence interval for the odds ratio: logistic regression......Page 213
4.21. Creating a Stata data file......Page 214
4.22. Analyzing case–control data with Stata......Page 217
4.23. Regressing disease against exposure......Page 219
4.24. Additional reading......Page 220
4.25. Exercises......Page 221
5.1. Mantel–Haenszel estimate of an age-adjusted odds ratio......Page 223
5.2. Mantel–Haenszel X2 statistic for multiple 2×2 tables......Page 225
5.4. Breslow–Day–Tarone test for homogeneity......Page 226
5.5. Calculating the Mantel–Haenszel odds ratio using Stata......Page 228
5.6. Multiple logistic regression model......Page 232
5.7. 95% confidence interval for an adjusted odds ratio......Page 233
5.8. Logistic regression for multiple 2×2 contingency tables......Page 234
5.9. Analyzing multiple 2×2 tables with Stata......Page 236
5.10. Handling categorical variables in Stata......Page 238
5.11. Effect of dose of alcohol on esophageal cancer risk......Page 239
5.11.1. Analyzing Model (5.25) with Stata......Page 241
5.13. Deriving odds ratios from multiple parameters......Page 243
5.15. Confidence intervals for weighted sums of coefficients......Page 244
5.17. The estimated variance–covariance matrix......Page 245
5.18. Multiplicative models of two risk factors......Page 246
5.19. Multiplicative model of smoking, alcohol, and esophageal cancer......Page 247
5.20. Fitting a multiplicative model with Stata......Page 249
5.21. Model of two risk factors with interaction......Page 253
5.22. Model of alcohol, tobacco, and esophageal cancer with interaction terms......Page 255
5.23. Fitting a model with interaction using Stata......Page 256
5.24. Model fitting: nested models and model deviance......Page 260
5.26. Goodness-of-fit tests......Page 262
5.26.1. The Pearson χ2 goodness-of-fit statistic......Page 263
5.27.1. An example: the Ille-et-Vilaine cancer data set......Page 264
5.28. Residual and influence analysis......Page 266
5.28.2. …influence statistic......Page 267
5.28.3. Residual plots of the Ille-et-Vilaine data on esophageal cancer......Page 268
5.29. Using Stata for goodness-of-fit tests and residual analyses......Page 270
5.32. Analyzing data with missing values......Page 280
5.32.1. Imputing data that is missing at random......Page 281
5.32.2. Cardiac output in the Ibuprofen in Sepsis Study......Page 282
5.32.3. Modeling missing values with Stata......Page 285
5.33. Logistic regression using restricted cubic splines......Page 287
5.33.1. Odds ratios from restricted cubic spline models......Page 288
5.34. Modeling hospital mortality in the SUPPORT Study......Page 289
5.35. Using Stata for logistic regression with restricted cubic splines......Page 293
5.36.1. Proportional odds logistic regression......Page 300
5.36.2. Polytomous logistic regression......Page 301
5.37. Additional reading......Page 304
5.38. Exercises......Page 305
6.1. Survival and cumulative mortality functions......Page 309
6.2. Right censored data......Page 311
6.3. Kaplan–Meier survival curves......Page 312
6.4. An example: genetic risk of recurrent intracerebral hemorrhage......Page 313
6.5. 95% confidence intervals for survival functions......Page 315
6.6. Cumulative mortality function......Page 317
6.8. Log-rank test......Page 318
6.9. Using Stata to derive survival functions and the log-rank test......Page 320
6.10. Log-rank test for multiple patient groups......Page 327
6.12. Proportional hazards......Page 328
6.13. Relative risks and hazard ratios......Page 329
6.14. Proportional hazards regression analysis......Page 331
6.16. Proportional hazards regression analysis with Stata......Page 332
6.17. Tied failure times......Page 333
6.19. Exercises......Page 334
7.2. Relative risks and hazard ratios......Page 337
7.5.1. Kaplan—Meier survival curves for DBP......Page 339
7.5.2. Simple hazard regression model for CHD risk and DBP......Page 340
7.5.3. Restricted cubic spline model of CHD risk and DBP......Page 342
7.5.4. Categorical hazard regression model of CHD risk and DBP......Page 345
7.5.5. Simple hazard regression model of CHD risk and gender......Page 346
7.5.6. Multiplicative model of DBP and gender on risk of CHD......Page 347
7.5.7. Using interaction terms to model the effects of gender and DBP on CHD......Page 348
7.5.8. Adjusting for confounding variables......Page 349
7.5.9. Interpretation......Page 351
7.5.10. Alternative models......Page 352
7.6. Proportional hazards regression analysis using Stata......Page 353
7.7. Stratified proportional hazards models......Page 370
7.8. Survival analysis with ragged study entry......Page 371
7.8.2. Age, sex, and CHD in the Framingham Heart Study......Page 372
7.8.4. Survival analysis with ragged entry using Stata......Page 373
7.9. Predicted survival, log—log plots and the proportional hazards assumption......Page 376
7.9.1. Evaluating the proportional hazards assumption with Stata......Page 379
7.10. Hazard regression models with time-dependent covariates......Page 381
7.10.1. Testing the proportional hazards assumption......Page 383
7.10.2. Modeling time-dependent covariates with Stata......Page 384
7.12. Exercises......Page 392
8.1. Elementary statistics involving rates......Page 395
8.2. Calculating relative risks from incidence data using Stata......Page 396
8.4. Simple Poisson regression for 2×2 tables......Page 398
8.5. Poisson regression and the generalized linear model......Page 400
8.7. Simple Poisson regression with Stata......Page 401
8.8.1. Recoding survival data on patients as patient—year data......Page 403
8.8.2. Converting survival records to person—years of follow-up using Stata......Page 405
8.9. Converting the Framingham survival data set to person–time data......Page 408
8.10. Simple Poisson regression with multiple data records......Page 414
8.11. Poisson regression with a classification variable......Page 415
8.12. Applying simple Poisson regression to the Framingham data......Page 417
8.13. Additional reading......Page 419
8.14. Exercises......Page 420
9.1. Multiple Poisson regression model......Page 423
9.2. An example: the Framingham Heart Study......Page 426
9.2.1. A multiplicative model of gender, age and coronary heart disease......Page 427
9.2.2. A model of age, gender and CHD with interaction terms......Page 430
9.2.3. Adding confounding variables to the model......Page 432
9.3. Using Stata to perform Poisson regression......Page 433
9.4.1. Deviance residuals......Page 445
9.5. Residual analysis of Poisson regression models using Stata......Page 446
9.7. Exercises......Page 449
10.1. One-way analysis of variance......Page 451
10.2. Multiple comparisons......Page 453
10.3. Reformulating analysis of variance as a linear regression model......Page 455
10.4. Non-parametric methods......Page 456
10.6. Example: a polymorphism in the estrogen receptor gene......Page 457
10.7. User contributed software in Stata......Page 460
10.8. One-way analyses of variance using Stata......Page 461
10.9. Two-way analysis of variance, analysis of covariance, and other models......Page 468
10.11. Exercises......Page 470
11.1. Example: effect of race and dose of isoproterenol on blood flow......Page 473
11.2. Exploratory analysis of repeated measures data using Stata......Page 475
11.3. Response feature analysis......Page 481
11.4. Example: the isoproterenol data set......Page 482
11.5. Response feature analysis using Stata......Page 485
11.6. The area-under-the-curve response feature......Page 490
11.8. Common correlation structures......Page 492
11.9. GEE analysis and the Huber–White sandwich estimator......Page 494
11.10. Example: analyzing the isoproterenol data with GEE......Page 495
11.11. Using Stata to analyze the isoproterenol data set using GEE......Page 498
11.13. Additional reading......Page 503
11.14. Exercises......Page 504
A Summary of statistical models discussed in this text......Page 507
B Summary of Stata commands used in this text......Page 513
References......Page 529
Index......Page 535