Data Analysis Tools for DNA Microarrays

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Technology today allows the collection of biological information at an unprecedented level of detail and in increasingly vast quantities. To reap real knowledge from the mountains of data produced, however, requires interdisciplinary skills-a background not only in biology but also in computer science and the tools and techniques of data analysis.

To help meet the challenges of DNA research, Data Analysis Tools for DNA Microarrays builds the foundation in the statistics and data analysis tools needed by biologists and provides the overview of microarrays needed by computer scientists. It first presents the basics of microarray technology and more importantly, the specific problems the technology poses from the data analysis perspective. It then introduces the fundamentals of statistics and the details of the techniques most commonly used to analyze microarray data. The final chapter focuses on commercial applications with sections exploring various software packages from BioDiscovery, Insightful, SAS, and Spotfire. The book is richly illustrated with more than 230 figures in full color and comes with a CD-ROM containing full-feature trial versions of software for image analysis (ImaGene, BioDiscovery Inc.) and data analysis (GeneSight, BioDiscovery Inc. and S-Plus Array Analyzer, Insightful Inc.).

Written in simple language and illustrated in full color, Data Analysis Tools for DNA Microarrays lowers the communication barrier between life scientists and analytical scientists. It prepares those charged with analyzing microarray data to make informed choices about the techniques to use in a given situation and contribute to further advances in the field.

Author(s): Sorin Draghici
Publisher: Chapman and Hall/CRC
Year: 2003

Language: English
Pages: 512
Tags: Биологические дисциплины;Матметоды и моделирование в биологии;Биоинформатика;

DATA ANALYSIS TOOLS FOR MICRO ARRAYS......Page 1
Preface......Page 4
Audience and prerequisites......Page 5
Aims and contents......Page 6
Road map......Page 8
Acknowledgments......Page 10
Contents......Page 14
List of Tables......Page 21
List of Figures......Page 22
1.1 Bioinformatics ? an emerging discipline......Page 28
1.2 The building blocks of genomic information......Page 30
1.3 Expression of genetic information......Page 34
1.4 The need for microarrays......Page 38
1.5 Summary......Page 39
2.1 Microarrays ? tools for gene expression analysis......Page 40
2.2 Fabrication of microarrays......Page 41
2.2.2 In situ synthesis......Page 42
2.3 Applications of microarrays......Page 47
2.4 Challenges in using microarrays in gene expression studies......Page 48
2.5 Sources of variability......Page 53
2.6 Summary......Page 57
3.2 Basic elements of digital imaging......Page 58
3.3 Microarray image processing......Page 63
3.4.1 Spot finding......Page 67
3.4.2 Image segmentation......Page 68
3.4.3 Quantification......Page 75
3.4.4 Spot quality assessment......Page 78
3.5 Image processing of Affymetrix arrays......Page 80
3.6 Summary......Page 83
4.1 Introduction......Page 85
4.2 Some basic terms......Page 86
4.3.1.1 Mean......Page 88
4.3.1.4 Characteristics of the mean, mode and median......Page 90
4.3.2.2 Variance......Page 92
4.3.3 Some interesting data manipulations......Page 94
4.3.4 Covariance and correlation......Page 95
4.4 Probabilities......Page 101
4.4.1.1 Addition rule......Page 104
4.4.1.2 Conditional probabilities......Page 105
4.4.1.3 General multiplication rule......Page 107
4.5 Bayes’ theorem......Page 108
4.6 Probability distributions......Page 110
4.6.1 Discrete random variables......Page 111
4.6.2 Binomial distribution......Page 113
4.6.3 Continuous random variables......Page 118
4.6.4 The normal distribution......Page 120
4.6.5 Using a distribution......Page 123
4.7 Central limit theorem......Page 126
4.8 Are replicates useful?......Page 128
4.10 Solved problems......Page 130
4.11 Exercises......Page 131
5.2 The framework......Page 133
5.3 Hypothesis testing and significance......Page 136
5.3.1 One-tail testing......Page 137
5.3.2 Two-tail testing......Page 142
5.4 “I do not believe God does not exist?......Page 144
5.5 An algorithm for hypothesis testing......Page 145
5.6 Errors in hypothesis testing......Page 146
5.8 Solved problems......Page 150
6.1 Introduction......Page 153
6.2.1 Tests involving the mean. The t distribution.......Page 154
6.2.2 Choosing the number of replicates......Page 158
6.2.3 Tests involving the variance 쌀㈀. The chi-square distribution......Page 160
6.2.4 Confidence intervals for standard deviation......Page 163
6.3.1 Comparing variances. The F distribution.......Page 164
6.3.2 Comparing means......Page 168
6.3.2.1 Equal variances......Page 170
6.3.2.2 Unequal variances......Page 172
6.3.3 Confidence intervals for the difference of means µ1 - µ2......Page 173
6.4 Summary......Page 174
6.5 Exercises......Page 177
7.1.1 Problem definition and model assumptions......Page 178
7.1.2 The “dot? notation......Page 181
7.2.1 One-way Model I ANOVA......Page 182
7.2.1.1 Partitioning the Sum of Squares......Page 183
7.2.1.3 Testing the hypotheses......Page 185
7.2.2 One-way Model II ANOVA......Page 189
7.3 Two-way ANOVA......Page 192
7.3.1 Randomized complete block design ANOVA......Page 193
7.3.2 Comparison between one-way ANOVA and randomized block design ANOVA......Page 195
7.3.3 Some examples......Page 197
7.3.4 Factorial design two-way ANOVA......Page 201
7.3.5 Data analysis plan for factorial design ANOVA......Page 205
7.4 Quality control......Page 206
7.5 Summary......Page 209
7.6 Exercises......Page 210
8.1 The concept of experiment design......Page 211
8.2 Comparing varieties......Page 212
8.3 Improving the production process......Page 214
8.4 Principles of experimental design......Page 215
8.4.1 Replication......Page 216
8.4.2 Randomization......Page 218
8.4.3 Blocking......Page 219
8.5 Guidelines for experimental design......Page 220
8.6.1 The fixed effect design......Page 222
8.6.3 Balanced incomplete block design......Page 223
8.6.4 Latin square design......Page 224
8.6.5 Factorial design......Page 225
8.6.6 Confounding in the factorial design......Page 226
8.7 Some microarray specific experiment designs......Page 227
8.7.1 The Jackson Lab approach......Page 228
8.7.2 Ratios and flip-dye experiments......Page 230
8.7.3 Reference design vs. loop design......Page 232
8.8 Summary......Page 235
9.2 The problem of multiple comparisons......Page 237
9.3 A more precise argument......Page 242
9.4.1 The Sidak correction......Page 244
9.4.2 The Bonferroni correction......Page 245
9.4.3 Holm’s step-wise correction......Page 246
9.4.5 Permutation correction......Page 247
9.4.6 Significance analysis of microarrays SAM......Page 249
9.4.7 On permutations based methods......Page 250
9.5 Summary......Page 251
10.2 Box plots......Page 252
10.3 Gene pies......Page 253
10.4 Scatter plots......Page 254
10.4.1 Scatter plot limitations......Page 258
10.4.2 Scatter plot summary......Page 259
10.5 Histograms......Page 260
10.5.1 Histograms summary......Page 265
10.6 Time series......Page 266
10.7 Principal component analysis PCA......Page 267
10.7.2 PCA summary......Page 278
10.8 Independent component analysis ICA......Page 280
10.9 Summary......Page 281
11.1 Introduction......Page 283
11.2 Distance metric......Page 284
11.2.1 Euclidean distance......Page 285
11.2.2 Manhattan distance......Page 286
11.2.4 Angle between vectors......Page 288
11.2.5 Correlation distance......Page 289
11.2.7 Standardized Euclidean distance......Page 290
11.2.8 Mahalanobis distance......Page 292
11.2.10 When to use what distance......Page 293
11.2.11 A comparison of various distances......Page 295
11.3 Clustering algorithms......Page 296
11.3.1 k-means clustering......Page 301
11.3.1.2 Cluster quality assessment......Page 303
11.3.2 Hierarchical clustering......Page 308
11.3.2.1 Inter-cluster distances and algorithm complexity......Page 310
11.3.2.2 Top-down vs. bottom-up......Page 311
11.3.2.4 An illustrative example......Page 313
11.3.3 Kohonen maps or self-organizing feature maps SOFM......Page 317
11.4 Summary......Page 325
12.2.1 The log transform......Page 328
12.2.2 Combining replicates and eliminating outliers......Page 330
12.2.3 Array normalization......Page 332
12.2.3.2 Subtracting the mean......Page 334
12.2.3.4 Iterative linear regression......Page 336
12.3.1 Background correction......Page 337
12.3.1.5 Background correction using control spots......Page 338
12.3.3 Color normalization......Page 339
12.3.3.2 LOWESS/LOESS normalization......Page 341
12.3.3.3 Piece-wise normalization......Page 346
12.4.1 Background correction......Page 348
12.4.2 Signal calculation......Page 349
12.4.2.1 Ideal mismatch......Page 351
12.4.2.3 Scaled probe values......Page 352
12.4.3 Detection calls......Page 353
12.4.4 Relative expression values......Page 354
12.6 Useful pre-processing and normalization sequences......Page 355
12.7 Summary......Page 357
12.8.1 A short primer on logarithms......Page 358
13.1 Introduction......Page 360
13.2 Criteria......Page 361
13.3.1 Description......Page 362
13.3.2 Characteristics......Page 364
13.4.1 Description......Page 366
13.4.2 Characteristics......Page 367
13.5.1 Description......Page 368
13.5.2 Characteristics......Page 369
13.6.2 Characteristics......Page 370
13.7.1 Description......Page 371
13.7.2 Characteristics......Page 372
13.8.1 Description......Page 373
13.8.2 Characteristics......Page 376
13.9 Affymetrix comparison calls......Page 377
13.10 Other methods......Page 378
13.11 Summary......Page 379
13.12.1 A comparison of the noise sampling method with the full blown ANOVA approach......Page 380
14.1 Introduction......Page 382
14.2.2 What is the Gene Ontology GO?......Page 383
14.2.3.1 GO data representation......Page 384
14.2.4 Access to GO......Page 385
14.4 Translating lists of differentially regulated genes into biological knowledge......Page 386
14.4.1 Statistical approaches......Page 388
14.5.1 Implementation......Page 391
14.5.2 Graphical input interface description......Page 392
14.5.3 Some real data analyses......Page 395
14.5.4 Interpretation of the functional analysis results......Page 400
14.6 Summary......Page 401
15.1 Introduction......Page 402
15.3 Onto-Compare......Page 404
15.4 Some comparisons......Page 406
15.5 Summary......Page 410
16.1 Introduction......Page 412
16.2.1 Problem description......Page 414
16.2.3.1 Creating the data set......Page 415
16.2.3.2 Data preprocessing and normalization......Page 416
16.2.3.4 Creating Hedenfalk’s gene list......Page 417
16.2.3.6 Exporting the gene list......Page 419
16.2.3.9 Comparison using Principal Component Analysis......Page 421
16.2.4 Conclusion......Page 426
16.3 Statistical analysis of microarray data using S-PLUS and Insightful ArrayAnalyzer......Page 428
16.3.3 Differential expression analysis......Page 429
16.3.5 Analysis summaries, visualization and annotation of results......Page 430
16.3.6 S+ArrayAnalyzer example: Swirl Zebrafish experiment......Page 431
16.3.7 Summary......Page 434
16.4.1 SAS research data management......Page 435
16.4.1.3 Security model......Page 436
16.4.2.1 Input engines......Page 437
16.4.2.2 Analytical processes......Page 438
16.5.2 Experiment description......Page 440
16.5.3 Microarray data access......Page 441
16.5.4 Data transformation......Page 442
16.5.5 Filtering and visualizing gene expression data......Page 443
16.5.6 Finding gene expression patterns......Page 446
16.5.7 Using clustering and data reduction techniques to isolate group of genes......Page 447
16.5.8 Comparing sample groups......Page 450
16.5.9 Using Portfolio Lists to isolate significant genes......Page 451
16.5.10 Summary......Page 453
16.6 Summary......Page 455
17.2 Molecular diagnosis......Page 456
17.3 Gene regulatory networks......Page 458
17.4 Conclusions......Page 460
References......Page 462