This book provides an essential understanding of statistical concepts necessary for the analysis of genomic and proteomic data using computational techniques. The author presents both basic and advanced topics, focusing on those that are relevant to the computational analysis of large data sets in biology. Chapters begin with a description of a statistical concept and a current example from biomedical research, followed by more detailed presentation, discussion of limitations, and problems. The book starts with an introduction to probability and statistics for genome-wide data, and moves into topics such as clustering, classification, multi-dimensional visualization, experimental design, statistical resampling, and statistical network analysis.
- Clearly explains the use of bioinformatics tools in life sciences research without requiring an advanced background in math/statistics
- Enables biomedical and life sciences researchers to successfully evaluate the validity of their results and make inferences
- Enables statistical and quantitative researchers to rapidly learn novel statistical concepts and techniques appropriate for large biological data analysis
- Carefully revisits frequently used statistical approaches and highlights their limitations in large biological data analysis
- Offers programming examples and datasets
- Includes chapter problem sets, a glossary, a list of statistical notations, and appendices with references to background mathematical and technical material
- Features supplementary materials, including datasets, links, and a statistical package available online
Statistical Bioinformatics is an ideal textbook for students in medicine, life sciences, and bioengineering, aimed at researchers who utilize computational tools for the analysis of genomic, proteomic, and many other emerging high-throughput molecular data. It may also serve as a rapid introduction to the bioinformatics science for statistical and computational students and audiences who have not experienced such analysis tasks before.
Author(s): Jae K. Lee
Series: Methods of Biochemical Analysis
Publisher: Wiley
Year: 2010
Language: English
Pages: 386
STATISTICAL BIOINFORMATICS......Page 5
CONTENTS......Page 9
PREFACE......Page 13
CONTRIBUTORS......Page 15
Challenge 1: Multiple-Comparisons Issue......Page 17
Challenge 2: High-Dimensional Biological Data......Page 18
Challenge 5: Integration of Multiple, Heterogeneous Biological Data Information......Page 19
References......Page 21
2.1 Introduction......Page 23
2.2 Basic Concepts......Page 24
2.3 Conditional Probability and Independence......Page 26
2.4 Random Variables......Page 29
2.5 Expected Value and Variance......Page 31
2.6 Distributions of Random Variables......Page 35
2.7 Joint and Marginal Distribution......Page 55
2.8 Multivariate Distribution......Page 58
2.9 Sampling Distribution......Page 62
2.10 Summary......Page 70
3.1 Sources of Error in High-Throughput Biological Experiments......Page 73
3.2 Statistical Techniques for Quality Control......Page 75
3.3 Issues Specific to Microarray Gene Expression Experiments......Page 82
References......Page 85
4.1 Introduction......Page 87
4.2 Statistical Testing......Page 88
4.3 Error Controlling......Page 94
4.4 Real Data Analysis......Page 97
Acknowledgments......Page 103
References......Page 104
5 CLUSTERING: UNSUPERVISED LEARNING IN LARGE BIOLOGICAL DATA......Page 105
5.1 Measures of Similarity......Page 106
5.2 Clustering......Page 115
5.3 Assessment of Cluster Quality......Page 131
References......Page 139
6.1 Introduction......Page 145
6.2 Classification and Prediction Methods......Page 148
6.3 Feature Selection and Ranking......Page 156
6.4 Cross-Validation......Page 160
6.5 Enhancement of Class Prediction by Ensemble Voting Methods......Page 161
6.6 Comparison of Classification Methods Using High-Dimensional Data......Page 163
6.7 Software Examples for Classification Methods......Page 166
References......Page 170
7.1 Introduction......Page 173
7.2 Classical Multidimensional Visualization Techniques......Page 174
7.3 Two-Dimensional Projections......Page 177
7.4 Issues and Challenges......Page 181
7.5 Systematic Exploration of Low-Dimensional Projections......Page 182
7.6 One-Dimensional Histogram Ordering......Page 186
7.7 Two-Dimensional Scatterplot Ordering......Page 190
7.8 Conclusion......Page 197
References......Page 198
8.1 Introduction......Page 201
8.2 Statistical/Probabilistic Models......Page 203
8.3 Estimation Methods......Page 205
8.4 Numerical Algorithms......Page 207
8.5 Examples......Page 208
8.6 Conclusion......Page 214
References......Page 215
9.1 Randomization......Page 217
9.2 Replication......Page 218
9.3 Pooling......Page 225
9.4 Blocking......Page 226
9.5 Design for Classifications......Page 230
9.7 Design for eQTL Studies......Page 231
References......Page 232
10.1 Introduction......Page 235
10.2 Resampling Methods for Prediction Error Assessment and Model Selection......Page 237
10.3 Feature Selection......Page 241
10.5 Practical Example: Lymphoma......Page 242
10.6 Resampling Methods......Page 243
10.7 Bootstrap Methods......Page 248
10.8 Sample Size Issues......Page 249
10.9 Loss Functions......Page 251
10.10 Bootstrap Resampling for Quantifying Uncertainty......Page 252
10.11 Markov Chain Monte Carlo Methods......Page 254
10.12 Conclusions......Page 256
References......Page 263
11.1 Introduction......Page 265
11.2 Boolean Network Modeling......Page 266
11.3 Bayesian Belief Network......Page 275
11.4 Modeling of Metabolic Networks......Page 289
References......Page 295
12.2 Alleles, Linkage Disequilibrium, and Haplotype......Page 299
12.3 International HapMap Project......Page 301
12.4 Genotyping Platforms......Page 302
12.5 Overview of Current GWAS Results......Page 303
12.6 Statistical Issues in GWAS......Page 306
12.7 Haplotype Analysis......Page 312
12.9 Gene × Gene and Gene × Environment Interactions......Page 314
12.10 Gene and Pathway-Based Analysis......Page 315
12.12 Meta-Analysis......Page 317
12.14 Conclusions......Page 318
References......Page 319
13.1 Introduction......Page 325
13.2 Brief overview of the Bioconductor Project......Page 326
13.3 Experimental Data......Page 327
13.4 Annotation......Page 334
13.5 Models of Biological Systems......Page 344
13.6 Conclusion......Page 351
References......Page 352
INDEX......Page 355