This ready reference discusses different methods for statistically analyzing and validating data created with high-throughput methods. As opposed to other titles, this book focusses on systems approaches, meaning that no single gene or protein forms the basis of the analysis but rather a more or less complex biological network. From a methodological point of view, the well balanced contributions describe a variety of modern supervised and unsupervised statistical methods applied to various large-scale datasets from genomics and genetics experiments. Furthermore, since the availability of sufficient computer power in recent years has shifted attention from parametric to nonparametric methods, the methods presented here make use of such computer-intensive approaches as Bootstrap, Markov Chain Monte Carlo or general resampling methods. Finally, due to the large amount of information available in public databases, a chapter on Bayesian methods is included, which also provides a systematic means to integrate this information. A welcome guide for mathematicians and the medical and basic research communities. Content:
Chapter 1 Control of Type I Error Rates for Oncology Biomarker Discovery with High?Throughput Platforms (pages 1–26): Jeffrey Miecznikowski, Dan Wang and Song Liu
Chapter 2 Overview of Public Cancer Databases, Resources, and Visualization Tools (pages 27–40): Frank Emmert?Streib, Ricardo de Matos Simoes, Shailesh Tripathi and Matthias Dehmer
Chapter 3 Discovery of Expression Signatures in Chronic Myeloid Leukemia by Bayesian Model Averaging (pages 41–55): Ka Yee Yeung
Chapter 4 Bayesian Ranking and Selection Methods in Microarray Studies (pages 57–74): Hisashi Noma and Shigeyuki Matsui
Chapter 5 Multiclass Classification via Bayesian Variable Selection with Gene Expression Data (pages 75–92): Yang Aijun, Song Xinyuan and Li Yunxian
Chapter 6 Semisupervised Methods for Analyzing High?dimensional Genomic Data (pages 93–106): Devin C. Koestler
Chapter 7 Colorectal Cancer and Its Molecular Subsystems: Construction, Interpretation, and Validation (pages 107–132): Vishal N. Patel and Mark R. Chance
Chapter 8 Network Medicine: Disease Genes in Molecular Networks (pages 133–151): Sreenivas Chavali and Kartiek Kanduri
Chapter 9 Inference of Gene Regulatory Networks in Breast and Ovarian Cancer by Integrating Different Genomic Data (pages 153–171): Binhua Tang, Fei Gu and Victor X. Jin
Chapter 10 Network?Module?Based Approaches in Cancer Data Analysis (pages 173–192): Guanming Wu and Lincoln Stein
Chapter 11 Discriminant and Network Analysis to Study Origin of Cancer (pages 193–214): Li Chen, Ye Tian, Guoqiang Yu, David J. Miller, Ie?Ming Shih and Yue Wang
Chapter 12 Intervention and Control of Gene Regulatory Networks: Theoretical Framework and Application to Human Melanoma Gene Regulation (pages 215–238): Nidhal Bouaynaya, Roman Shterenberg, Dan Schonfeld and Hassan M. Fathallah?Shaykh
Chapter 13 Identification of Recurrent DNA Copy Number Aberrations in Tumors (pages 239–260): Vonn Walter, Andrew B. Nobel, D. Neil Hayes and Fred A. Wright
Chapter 14 The Cancer Cell, Its Entropy, and High?Dimensional Molecular Data (pages 261–285): Wessel N. van Wieringen and Aad W. van der Vaart
Author(s): M. Dehmer, Frank Emmert?Streib(eds.)
Language: English
Pages: 312
Tags: Медицинские дисциплины;Социальная медицина и медико-биологическая статистика;
Contents......Page 5
Preface......Page 13
List of Contributors......Page 17
Part One: General Overview......Page 21
1.2 Introduction......Page 23
1.3 High-Throughput Platforms......Page 24
1.3.2 RNA-Seq......Page 25
1.3.4 Mass Spectrometry Platforms......Page 26
1.3.6 Preprocessing HT Platforms......Page 27
1.4.1 Linear Regression......Page 28
1.4.1.1 Simple Linear Regression......Page 29
1.4.2 Logistic Regression Y Discrete......Page 31
1.4.3.1 Kaplan?Meier Analysis......Page 33
1.5 Multiple Testing Type I Errors......Page 35
1.5.1.2 Holm Procedure......Page 37
1.5.1.4 Generalized ?idàk Procedure......Page 38
1.6 Discussion......Page 39
1.7 Perspective......Page 40
References......Page 41
2.2 Introduction......Page 47
2.3 Different Cancer Types are Genetically Related......Page 48
2.4 Incidence and Mortality Rates of Cancer......Page 49
2.5 Cancer and Disorder Databases......Page 50
2.6.2 R-Based Packages......Page 54
2.7 Conclusions......Page 55
References......Page 57
Part Two: Bayesian Methods......Page 61
3.1 Brief Introduction......Page 63
3.3 Variable Selection on Gene Expression Data......Page 64
3.4 Bayesian Model Averaging BMA......Page 66
3.4.1 The Iterative BMA Algorithm iBMA......Page 67
3.4.2 Computational Assessment......Page 68
3.5 Case Study: CML Progression Data......Page 69
3.6 The Power of iBMA......Page 70
3.7 Laboratory Validation......Page 71
3.8 Conclusions......Page 72
3.9 Perspective......Page 73
References......Page 74
4.2 Introduction......Page 77
4.3 Hierarchical Mixture Modeling and Empirical Bayes Estimation......Page 79
4.4.1 Ranking Based on Effect Sizes......Page 80
4.4.1.2 Rank Posterior Mean RPM......Page 81
4.4.1.3 Tail-Area Posterior Probability TPP......Page 82
4.4.2.1 Posterior Probability of Differentially Expressed PPDE......Page 83
4.4.2.2 Evaluating Selection Accuracy......Page 84
4.5 Simulations......Page 85
4.6 Application......Page 87
4.7 Concluding Remarks......Page 91
4.9 Appendix : The EM Algorithm......Page 92
References......Page 93
5.2 Introduction......Page 95
5.4.1 Model......Page 97
5.4.2 Prior Specification......Page 99
5.4.3 Computation......Page 100
5.4.4 Classification......Page 102
5.5.1 Leukemia Data......Page 103
5.5.2 Lymphoma Data......Page 107
5.7 Perspective......Page 109
References......Page 110
6.2 Motivation......Page 113
6.3 Existing Approaches......Page 115
6.3.2 Fully Supervised Procedures......Page 116
6.3.3 Semisupervised Procedures......Page 117
6.3.3.1 Semisupervised Clustering......Page 119
6.3.3.2 Semisupervised RPMM......Page 120
6.3.3.3 Considerations Regarding Semisupervised Procedures......Page 121
6.4 Data Application: Mesothelioma Cancer Data Set......Page 122
6.4.1 Results: Mesothelioma Cancer Data Set......Page 124
6.5 Perspective......Page 125
References......Page 126
Part Three: Network-Based Approaches......Page 127
7.2 Colon Cancer: Etiology......Page 129
7.3 Colon Cancer: Development......Page 130
7.4 The Pathway Paradigm......Page 131
7.5 Cancer Subtypes and Therapies......Page 132
7.7.1 Measurements......Page 133
7.7.2 Manifolds......Page 134
7.8.1 Examples......Page 137
7.9 Molecular Subsystems: Validation......Page 139
7.10 Worked Example: Label-Free Proteomics......Page 140
7.10.2 Peptide-Level Significance......Page 142
7.10.3 Exon-Level Significance......Page 145
7.10.4 Summarizing the Results......Page 146
7.11 Conclusions......Page 147
7.12 Perspective......Page 148
References......Page 149
8.2 Introduction......Page 153
8.3 Genetic Architecture of Human Diseases......Page 154
8.4.1 Network Measures......Page 156
8.4.2 Disease and Disease-Gene Networks......Page 157
8.4.3 Disease Genes in Protein Interaction Networks......Page 159
8.4.4 Identification of Disease Modules......Page 163
8.5.1 Linkage Methods......Page 165
8.5.2 Disease-Module-Based Methods......Page 166
8.6 Conclusion......Page 167
References......Page 168
9.2 Introduction......Page 173
9.3.1 Basic Theory of Gene Regulatory Network......Page 174
9.3.2.3 Discover the Transfer Rules of Genetic Information During Gene Expression......Page 175
9.4.1 The In Silico Analytical Approach......Page 176
9.4.1.1 Study Case 1: Inference of Static Gene Regulatory Network of Estrogen-Dependent Breast Cancer Cell Line......Page 178
9.4.1.2 Study Case 2: Gene Regulatory Network of Genome-Wide Mapping of TGFb/SMAD4 Targets in Ovarian Cancer Patients......Page 180
9.4.2 A Bayesian Inference Approach for Genetic Regulatory Analysis......Page 184
9.4.2.1 Study Case: ERa Transcriptional Regulatory Dynamics in Breast Cancer Cell......Page 185
9.5 Conclusions......Page 187
9.6 Perspective......Page 188
References......Page 189
10.2 Introduction......Page 193
10.4 Network Modules Containing Functionally Similar Genes or Proteins......Page 194
10.5.1 Greedy Network Module Search Algorithms......Page 195
10.5.3 Network Clustering Algorithms......Page 196
10.5.4 Community Search Algorithms......Page 197
10.5.6 Weighted Gene Expression Network......Page 198
10.6.2 Cancer Driver Gene Search Based on Network Modules......Page 199
10.7 The Reactome FI Cytoscape Plug-in......Page 200
10.7.3 Cancer Gene Index Data Set......Page 201
10.7.4.1 Loading the Mutation File into Cytoscape and Constructing a FI Subnetwork......Page 202
10.7.4.2 Network Clustering and Network Module Functional Analysis......Page 204
10.7.4.3 Module-Based Survival Analysis......Page 206
10.7.4.4 Cancer Gene Index Data Overlay Analysis......Page 207
10.9 Perspective......Page 209
References......Page 211
11.2 Introduction......Page 213
11.3.1 Fisher’s Discriminant Analysis and ANOVA......Page 214
11.3.2 Hierarchical Clustering......Page 215
11.3.3 One-Versus-All Support Vector Machine and Nearest-Mean Classifier......Page 216
11.3.4 Differential Dependency Network......Page 217
11.4.1 CNA Data Analysis for Testing Existence of Monoclonality......Page 218
11.4.1.2 Assessing Statistical Significance of Monoclonality......Page 220
11.4.2 A Two-Stage Analytical Method for Testing the Origin of Cancer......Page 221
11.4.2.1 Basic Assumptions......Page 222
11.4.2.3 Stage 1: Feature Selection and Classification......Page 223
11.5.1.1 Testing Existence of Monoclonality......Page 224
11.5.1.2 The Significance of Monoclonality......Page 226
11.5.2.1 Stage 1 Results......Page 227
11.5.2.2 Stage 2 Results......Page 228
11.6 Conclusion......Page 231
References......Page 232
12.1 Brief Summary......Page 235
12.2 Gene Regulatory Network Models......Page 236
12.3 Intervention in Gene Regulatory Networks......Page 238
12.3.1 Optimal Stochastic Control......Page 239
12.3.2 Heuristic Control Strategies......Page 241
12.3.3 Structural Intervention Strategies......Page 242
12.4 Optimal Perturbation Control of Gene Regulatory Networks......Page 243
12.4.2.1 Minimal-Energy Perturbation Control......Page 246
12.4.3 Trade-offs Between Minimal-Energy and Fastest Convergence Rate Perturbation Control......Page 248
12.5 Human Melanoma Gene Regulatory Network......Page 251
12.6 Perspective......Page 255
References......Page 256
Part Four: Phenotype Influence of DNA Copy Number Aberrations......Page 259
13.1 Introduction......Page 261
13.2.1 Definitions......Page 262
13.2.2 Mechanisms of DNA Copy Number Change: An Overview......Page 263
13.2.3 CNAs and Cancer......Page 264
13.2.5 Measuring DNA Copy Number......Page 265
13.3 Analyzing DNA Copy Number: Single Sample Methods......Page 266
13.3.3 Thresholding......Page 267
13.3.5 Methods Based on Hidden Markov Models......Page 268
13.4.1 Additional Preprocessing and Summary Statistics......Page 269
13.4.3 Assessing Statistical Significance: An Overview......Page 270
13.5.1 Cyclic Shifts......Page 271
13.5.2 Assessing Statistical Significance with DiNAMIC......Page 272
13.5.3 Peeling......Page 273
13.5.4 Confidence Intervals for Recurrent CNAs......Page 276
13.5.5 Bootstrap Test-Based Confidence Intervals in Real Datasets......Page 277
13.6 Open Questions......Page 278
References......Page 279
14.2 Introduction......Page 281
14.3.1 Molecular Biology......Page 282
14.3.3 Measurement Devices......Page 283
14.4 Entropy Increase......Page 284
14.5 Statistical Arguments......Page 286
14.6 Statistical Methodology......Page 288
14.6.2 Entropy......Page 289
14.6.3 Mutual Information......Page 292
14.8 Application to Cancer Data......Page 295
14.8.1 Analyses of Type II Experiments......Page 296
14.8.2 Analyses of Type I Experiments......Page 299
14.8.3 Potential......Page 300
14.8.4 Discussion......Page 302
14.10 Perspective......Page 303
References......Page 304
Index......Page 307