We live in a new age for statistical inference, where modern scientific technology such as microarrays and fMRI machines routinely produce thousands and sometimes millions of parallel data sets, each with its own estimation or testing problem. Doing thousands of problems at once is more than repeated application of classical methods. Taking an empirical Bayes approach, Bradley Efron, inventor of the bootstrap, shows how information accrues across problems in a way that combines Bayesian and frequentist ideas. Estimation, testing, and prediction blend in this framework, producing opportunities for new methodologies of increased power. New difficulties also arise, easily leading to flawed inferences. This book takes a careful look at both the promise and pitfalls of large-scale statistical inference, with particular attention to false discovery rates, the most successful of the new statistical techniques. Emphasis is on the inferential ideas underlying technical developments, illustrated using a large number of real examples.
Author(s): Bradley Efron
Series: Institute of Mathematical Statistics Monographs
Edition: 1
Publisher: Cambridge University Press
Year: 2010
Language: English
Pages: 277
Half-title......Page 3
Title......Page 5
Copyright......Page 6
Contents......Page 7
Prologue......Page 11
Acknowledgments......Page 14
1 Empirical Bayes and the James–Stein Estimator......Page 15
1.1 Bayes Rule and Multivariate Normal Estimation......Page 16
1.2 Empirical Bayes Estimation......Page 18
1.3 Estimating the Individual Components......Page 21
1.4 Learning from the Experience of Others......Page 24
1.5 Empirical Bayes Confidence Intervals......Page 26
Notes......Page 28
2.1 A Microarray Example......Page 29
2.2 Bayesian Approach......Page 31
2.3 Empirical Bayes Estimates......Page 34
2.4 Fdr(Z) as a Point Estimate......Page 36
2.5 Independence versus Correlation......Page 40
2.6 Learning from the Experience of Others II......Page 41
Notes......Page 42
3 Significance Testing Algorithms......Page 44
3.1 p-Values and z-Values......Page 45
3.2 Adjusted p-Values and the FWER......Page 48
3.3 Stepwise Algorithms......Page 51
3.4 Permutation Algorithms......Page 53
3.5 Other Control Criteria......Page 57
Notes......Page 59
4.1 True and False Discoveries......Page 60
4.2 Benjamini and Hochberg’s FDR Control Algorithm......Page 62
4.3 Empirical Bayes Interpretation......Page 66
4.4 Is FDR Control “Hypothesis Testing”?......Page 72
4.5 Variations on the Benjamini–Hochberg Algorithm......Page 73
4.6 Fdr and Simultaneous Tests of Correlation......Page 78
Notes......Page 83
5.1 Estimating the Local False Discovery Rate......Page 84
5.2 Poisson Regression Estimates for f (z)......Page 88
5.3 Inference and Local False Discovery Rates......Page 91
5.4 Power Diagnostics......Page 97
Notes......Page 102
6 Theoretical, Permutation, and Empirical Null Distributions......Page 103
6.1 Four Examples......Page 104
A. Leukemia study......Page 105
B. Chi-square data......Page 106
C. Police data......Page 109
D. HIV data......Page 110
6.2 Empirical Null Estimation......Page 111
6.3 The MLE Method for Empirical Null Estimation......Page 116
6.4 Why the Theoretical Null May Fail......Page 119
6.5 Permutation Null Distributions......Page 123
Notes......Page 126
7 Estimation Accuracy......Page 127
7.1 Exact Covariance Formulas......Page 129
7.2 Rms Approximations......Page 135
7.3 Accuracy Calculations for General Statistics......Page 140
7.4 The Non-Null Distribution of z-Values......Page 146
7.5 Bootstrap Methods......Page 152
Notes......Page 153
8.1 Row and Column Correlations......Page 155
Standardization......Page 158
8.2 Estimating the Root Mean Square Correlation......Page 159
Simulating correlated z-values......Page 162
8.3 Are a Set of Microarrays Independent of Each Other?......Page 163
8.4 Multivariate Normal Calculations......Page 167
Effective sample size......Page 169
Correlation of t-values......Page 171
8.5 Count Correlations......Page 173
Notes......Page 176
9 Sets of Cases (Enrichment)......Page 177
9.1 Randomization and Permutation......Page 178
9.2 Efficient Choice of a Scoring Function......Page 184
9.3 A Correlation Model......Page 188
9.4 Local Averaging......Page 195
Notes......Page 198
10 Combination, Relevance, and Comparability......Page 199
10.1 The Multi-Class Model......Page 201
10.2 Small Subclasses and Enrichment......Page 206
Enrichment......Page 208
Efficiency......Page 209
10.3 Relevance......Page 210
10.4 Are Separate Analyses Legitimate?......Page 213
10.5 Comparability......Page 220
Notes......Page 223
11 Prediction and Effect Size Estimation......Page 225
11.1 A Simple Model......Page 227
Cross-validation......Page 230
Correlation corrections......Page 235
11.2 Bayes and Empirical Bayes Prediction Rules......Page 231
11.3 Prediction and Local False Discovery Rates......Page 237
11.4 Effect Size Estimation......Page 241
False coverage rate control......Page 244
11.5 The Missing Species Problem......Page 247
Notes......Page 254
Appendix A: Exponential Families......Page 257
A.1 Multiparameter Exponential Families......Page 259
A.2 Lindsey’s Method......Page 261
Data Sets......Page 263
Programs......Page 264
References......Page 265
Index......Page 272