DNA Microarrays: Databases and Statistics Part B

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Modern DNA microarray technologies have evolved over the past 25 years to the point where it is now possible to take many million measurements from a single experiment. These two volumes, Parts A & B in the Methods in Enzymology series provide methods that will shepard any molecular biologist through the process of planning, performing, and publishing microarray results.Part A starts with an overview of a number of microarray platforms, both commercial and academically produced and includes wet bench protocols for performing traditional expression analysis and derivative techniques such as detection of transcription factor occupancy and chromatin status. Wet-bench protocols and troubleshooting techniques continue into Part B. These techniques are well rooted in traditional molecular biology and while they require traditional care, a researcher that can reproducibly generate beautiful Northern or Southern blots should have no difficulty generating beautiful array hybridizations.Data management is a more recent problem for most biologists. The bulk of Part B provides a range of techniques for data handling. This includes critical issues, from normalization within and between arrays, to uploading your results to the public repositories for array data, and how to integrate data from multiple sources. There are chapters in Part B for both the debutant and the expert bioinformatician.Provides an overview of platformsIncludes experimental design and wet bench protocolsPresents statistical and data analysis methods, array databases, data visualization and meta analysis

Author(s): Kimmel A.R., Oliver B. (eds.)
Series: Methods in Enzymology 411
Publisher: Elsevier, Academic Press
Year: 2006

Language: English
Pages: 486

25.pdf......Page 0
Introduction......Page 9
Blood as a Biological Specimen......Page 10
Collection and Preservation of Blood Samples......Page 11
Methodologies for Globin Transcript Removal from Whole Blood......Page 12
Depletion of Globin Transcripts from Whole Blood......Page 13
Fractionation of Blood......Page 14
LeukoLOCK Filter Processing and Cell Lysis......Page 15
Use of Formalin-Fixed, Paraffin-Embedded Sections for Gene Expression Analysis......Page 16
Deparaffinization......Page 17
Use of Solid Tissue Clinical Specimens for Gene Expression Analysis......Page 18
RNA Quality Measurements for Microarray Analysis......Page 19
References......Page 20
Introduction......Page 23
Micro-RNAs Are Important Factors in Human Cancer......Page 24
Focus of This Chapter......Page 25
One-Color vs Two-Color Arrays......Page 26
Purification of miRNA......Page 27
Differences between miRNA and mRNA Expression Profiling......Page 28
Methods of Normalization for miRNA Microarray Experiments......Page 31
Case Study: miRNA Microarray Expression Analysis of Human Lung and Placental Tissues......Page 32
Sample Size Calculation......Page 33
Statistical Differential Analysis......Page 36
Hierarchical Clustering......Page 37
Conclusion......Page 38
References......Page 39
Introduction......Page 44
General Considerations......Page 45
Printing......Page 48
Sample Preparation and Labeling......Page 49
Background Fluorescence......Page 52
Hybridization Quality Assessment......Page 54
Discussion Groups......Page 58
References......Page 59
Introduction......Page 60
Description and Availability of External Controls......Page 61
Assesment of Array Performance Using External RNA Controls......Page 65
Methods for Synthesis and Utilization of External RNA Controls......Page 68
Evaluation of Data Analysis Methodology Using Spike-In Data Sets......Page 69
References......Page 71
Introduction......Page 74
Variability......Page 75
A Digression: Traceability, Validation, and Uncertainty......Page 77
Standards and Validation......Page 79
Data Exchange Standards......Page 82
The Molecular Biology Segment: RNA Isolation, mRNA Amplification, Target Preparation, and Hybridization......Page 83
Array Content......Page 84
Expression Measure Estimation and Biostatistical Analysis......Page 85
References......Page 86
Introduction......Page 90
Overview of the Scanning Process......Page 92
Array Handling......Page 93
Spatial Resolution, Signal Averaging, and Pixel Dwell Time......Page 96
PMT and Laser Settings......Page 97
Image Display Characteristics......Page 99
Signal Contamination......Page 100
Scanner Bias......Page 103
Alternative Scanning Technologies Provide Advantages......Page 104
Specific Considerations for Multiple Slide, Multiple Scanner, and/or Multiple Laboratory Experiments......Page 106
References......Page 107
Introduction......Page 110
Demonstration of BASE......Page 111
Installing BASE......Page 112
The Basics of BASE......Page 113
Filtering......Page 114
Ownership and Access Rights......Page 115
User Administration......Page 116
Protocols, Uploads, and File Formats......Page 117
Array LIMS......Page 119
Arrays......Page 120
Biomaterials......Page 121
Hybridizing and Scanning......Page 122
Uploading Raw Data......Page 123
Data Analysis: Experiments......Page 124
Creating a Root BioAssaySet......Page 125
Filtering Data......Page 126
Normalizing and Analyzing Data......Page 127
MAGE-ML Export......Page 128
References......Page 129
Bioconductor: An Open Source Framework for Bioinformatics and Computational Biology......Page 131
Software Distribution......Page 132
Preprocessed Microarray Data......Page 133
Processed Microarray Data......Page 134
General Biological Metadata......Page 136
Workflows......Page 138
Spotted Array Quality Control and Preprocessing......Page 139
Preprocessing of Affymetrix Data......Page 141
Addressing Multiple Comparisons......Page 143
Conclusions: Data Analysis for High-Throughput Biology and Bioconductor......Page 144
References......Page 145
Introduction......Page 147
MADAM......Page 150
MAD: The Microarray Database......Page 151
Data Entry and Editing Pages......Page 152
Report Generation......Page 155
MAGE-ML Writing......Page 156
Related Tools......Page 157
Spotfinder......Page 158
Image Analysis Goals......Page 159
Spotfinder 3......Page 160
Grid Expansion and Shrinking......Page 162
Spot Detection......Page 163
Spot Digitizing......Page 164
Quality Control Parameters......Page 165
Visualization of Quality Controls in Spotfinder Views......Page 166
Program Settings......Page 168
Grid Construction......Page 169
Grid Alignment Examination......Page 171
Annotation Import......Page 172
MIDAS......Page 173
Building a Pipeline......Page 174
Lowess Normalization......Page 175
Iterative Log-Mean Centering Normalization......Page 176
Ratio Statistics Normalization......Page 177
Background Filtering......Page 178
Slice Analysis......Page 179
MeV......Page 180
Data Representations and Distance Metrics......Page 182
Data Mining in MeV: A Brief Algorithm Overview......Page 184
K-Means/K-Medians Clustering (KMC)......Page 186
Assessing Confidence in Clustering Results: Support Trees, Figures of Merit, and K-Means Support......Page 188
Self-Organizing Maps......Page 189
One-Way and Two-Factor ANOVA......Page 190
Principal Components Analysis......Page 191
Expression Analysis Systematic Explorer (EASE)......Page 192
Interface Orientation and Selected Features......Page 193
Cluster Viewers......Page 194
Analysis Scripting......Page 195
Normalization Using Midas......Page 197
Modify the Parameters......Page 198
Assessing the Results: The Investigation Panel......Page 199
Launching MeV, File Loading, and Adjusting the Display......Page 200
Statistical Analysis......Page 201
Storing Clusters and Cluster Operations......Page 202
References......Page 203
Introduction......Page 207
Correlation Metrics......Page 208
Variations on the Pearson Correlation......Page 209
Euclidean Distance......Page 210
Agglomerative Hierarchical Clustering......Page 211
Single Linkage (aka Nearest Neighbor)......Page 212
Drawbacks of Clustering......Page 214
Self-Organizing Maps......Page 215
How Many Partitions to Make?......Page 216
Computational Considerations......Page 217
Freely Available Clustering/Analysis and Visualization Software......Page 218
Cluster......Page 219
Cluster 3.0......Page 223
Other Analysis Packages......Page 224
References......Page 225
Introduction......Page 227
Fixed versus Random Effects......Page 228
Types of Microarrays......Page 231
Biological versus Technical Replication......Page 233
Data Extraction and Normalization......Page 235
Gene-Specific ANOVA......Page 239
Significance Thresholds......Page 241
Software......Page 243
References......Page 244
Further Reading......Page 245
Introduction......Page 247
Pixel Statistics......Page 249
Pixel Statistics Methods......Page 253
Smooth Patterns and Block Effects......Page 254
Models of Array Patterns......Page 256
Conceptual Models for Probe Signals......Page 257
Population Models for Probe Signals......Page 258
Signal Processing Including Control Probes......Page 259
SIAM: General Method and Simplified Equations......Page 260
Multichannel Methods......Page 261
Two-Channel Error Propagation......Page 264
Metrics from Array Patterns and Reference Channel......Page 265
Method......Page 266
Enhancement......Page 267
References......Page 268
Description of the Experiment......Page 270
Interpreting the PCA Plot from Fig. 2......Page 271
Multidimensional Scaling (MDS)......Page 273
Hierarchical Clustering......Page 274
Finding Differentially Expressed Genes Using Analysis of Variance (ANOVA)......Page 275
Hierarchical Designs and Nested/Nesting Relationships......Page 277
Multiple Test Correction......Page 278
Poststatistical Analysis......Page 280
Visualizing Locations of Significant Genes on the Genome......Page 281
Summary......Page 283
References......Page 284
Statistics for ChIP-chip and DNase Hypersensitivity Experiments on NimbleGen Arrays......Page 285
Introduction......Page 286
ChIP-chip: An Overview......Page 287
Properties of ChIP-chip and DNase-chip Data......Page 288
Previously Developed Methods for Analysis of ChIP-chip Data......Page 289
ACME......Page 290
Optimizing Probe Resolution......Page 293
Recommendations for Assessing Data Quality......Page 294
Additional Features of ACME......Page 295
References......Page 296
Extrapolating Traditional DNA Microarray Statistics to Tiling and Protein Microarray Technologies......Page 298
Introduction......Page 299
Summary Statistics......Page 300
Statistical Significance......Page 302
Multiple Testing......Page 303
Data for Traditional, Gene-Centric DNA Microarrays......Page 304
Data for Tiling Microarrays......Page 306
Data for Protein Microarrays......Page 307
Motivation......Page 308
Background Correction......Page 311
Normalization via Total Intensity......Page 312
Normalization via Spiked Controls......Page 313
Correcting Signal Intensity Bias......Page 314
Correcting Array Location Bias......Page 315
Scoring for Significance......Page 316
t Test......Page 317
Cyber T......Page 319
Wilcoxon Signed Rank Test......Page 320
Analysis of Variance (ANOVA)......Page 321
Extensions to Protein Microarrays......Page 324
References......Page 325
Random Permutations......Page 328
Tests of Genetic Association......Page 330
Gene Clustering......Page 332
Supervised Classification......Page 334
Separating Training from Testing......Page 336
n-fold Cross-Validation (n-fold CV)......Page 337
Bootstrap Error Estimates......Page 338
Permutation Tests......Page 339
References......Page 340
Using Ontologies to Annotate Microarray Experiments......Page 342
What Is an Ontology?......Page 343
Gene Ontology......Page 344
How Was the MO Built?......Page 345
Where Can I Get the MO?......Page 346
MO in Detail......Page 347
Who Uses the MO?......Page 348
The Enterprise Vocabulary Service......Page 352
Releases and Management of the MO......Page 353
References......Page 355
Introduction......Page 357
Find Statistically Overrepresented GO Terms within a Group of Genes......Page 360
Description......Page 362
Using GOstat......Page 364
GOstat Output......Page 365
Discussion......Page 366
References......Page 367
Further Reading......Page 369
Gene Expression Omnibus: Microarray Data Storage, Submission, Retrieval, and Analysis......Page 370
Purpose and Scope of the Gene Expression Omnibus (GEO)......Page 371
Structure......Page 372
GEO-Constructed Data Sets......Page 373
Interpreting GEO Profiles Charts......Page 374
Submission......Page 377
Query and Analysis......Page 379
Entrez GEO Profiles......Page 380
DataSet Clusters......Page 382
Links......Page 383
Sorting and Limit Options Using Subset Effects Flags......Page 384
Conclusion......Page 385
References......Page 386
Introduction......Page 388
How to Query and Retrieve Data from the ArrayExpress Repository......Page 392
How to Query Data in the ArrayExpress Data Warehouse......Page 393
Data Analysis with Expression Profiler......Page 395
Data Selection, Normalization, and Transformations......Page 396
Clustering Analysis......Page 398
Comparative Group Analysis......Page 399
How to Submit Data to ArrayExpress......Page 401
Acknowledgments......Page 402
References......Page 403
Introduction......Page 405
Selection of Data Sets......Page 406
Detection of Statistically Significant Variations by the Rank Difference Analysis of Microarray Method (RDAM)......Page 407
Reproducibility of Replicates......Page 408
Relationships between Experimental Points......Page 410
Evolution of Total Variation across Ordered Comparisons in Each Experiment......Page 413
Similarity of Comparison Results......Page 414
Combinatorial Clustering......Page 416
Boolean Clustering......Page 417
Gene Clustering of Transcriptional Networks......Page 419
References......Page 424
Introduction......Page 426
Osprey......Page 428
An Example......Page 429
Reducing the Data Set......Page 430
Cytoscape......Page 431
Viewing and Filtering a Network......Page 432
Combining Interaction and Expression Data......Page 434
Visualizing Gene Ontology Annotations......Page 435
Summary......Page 437
References......Page 438
Introduction......Page 440
Classification......Page 441
Classification Trees......Page 442
Error Rate Estimates......Page 444
Proximities......Page 445
Unsupervised Learning and Clustering......Page 446
Case Study: Prostate Cancer Data Set......Page 447
References......Page 449
Further Reading......Page 450