This introductory text offers a clear exposition of the algorithmic principles driving advances in bioinformatics. Accessible to students in both biology and computer science, it strikes a unique balance between rigorous mathematics and practical techniques, emphasizing the ideas underlying algorithms rather than offering a collection of apparently unrelated problems.The book introduces biological and algorithmic ideas together, linking issues in computer science to biology and thus capturing the interest of students in both subjects. It demonstrates that relatively few design techniques can be used to solve a large number of practical problems in biology, and presents this material intuitively.An Introduction to Bioinformatics Algorithms is one of the first books on bioinformatics that can be used by students at an undergraduate level. It includes a dual table of contents, organized by algorithmic idea and biological idea; discussions of biologically relevant problems, including a detailed problem formulation and one or more solutions for each; and brief biographical sketches of leading figures in the field. These interesting vignettes offer students a glimpse of the inspirations and motivations for real work in bioinformatics, making the concepts presented in the text more concrete and the techniques more approachable.PowerPoint presentations, practical bioinformatics problems, sample code, diagrams, demonstrations, and other materials can be found at the Author's website.
Author(s): Neil C. Jones Pavel A. Pevzner
Edition: 1
Year: 2004
Language: English
Pages: 448
Preface......Page 16
1 Introduction......Page 20
2.1 What Is an Algorithm?......Page 26
2.2 Biological Algorithms versus Computer Algorithms......Page 33
2.3 The Change Problem......Page 36
2.4 Correct versus Incorrect Algorithms......Page 39
2.5 Recursive Algorithms......Page 43
2.6 Iterative versus Recursive Algorithms......Page 47
2.7 Fast versus Slow Algorithms......Page 52
2.8 Big-O Notation......Page 56
2.9 Algorithm Design Techniques......Page 59
2.9.1 Exhaustive Search......Page 60
2.9.2 Branch-and-Bound Algorithms......Page 61
2.9.4 Dynamic Programming......Page 62
2.9.7 Randomized Algorithms......Page 67
2.10 Tractable versus Intractable Problems......Page 68
2.11 Notes......Page 70
Richard Karp......Page 71
2.12 Problems......Page 73
3.1 What Is Life Made Of?......Page 76
3.2 What Is the GeneticMaterial?......Page 78
3.3 What Do Genes Do?......Page 79
3.5 What Is the Structure of DNA?......Page 80
3.6 What Carries Information between DNA and Proteins?......Page 82
3.7 How Are ProteinsMade?......Page 84
3.8.1 Copying DNA......Page 86
3.8.2 Cutting and Pasting DNA......Page 90
3.8.4 Probing DNA......Page 91
3.9 How Do Individuals of a Species Differ?......Page 92
3.10 How Do Different Species Differ?......Page 93
3.11 Why Bioinformatics?......Page 94
Russell F.Doolittle......Page 98
4.1 Restriction Mapping......Page 102
4.2 Impractical Restriction Mapping Algorithms......Page 106
4.3 A Practical Restriction Mapping Algorithm......Page 108
4.4 RegulatoryMotifs in DNA Sequences......Page 110
4.5 Profiles......Page 112
4.6 The Motif Finding Problem......Page 116
4.7 Search Trees......Page 119
4.8 FindingMotifs......Page 127
4.9 Finding a Median String......Page 130
4.10 Notes......Page 133
Gary Stormo,......Page 135
4.11 Problems......Page 138
5.1 Genome Rearrangements......Page 144
5.2 Sorting by Reversals......Page 146
5.3 Approximation Algorithms......Page 150
5.4 Breakpoints: A Different Face of Greed......Page 151
5.5 A Greedy Approach toMotif Finding......Page 155
5.6 Notes......Page 156
David Sankoff......Page 158
5.7 Problems......Page 162
6.1 The Power of DNA Sequence Comparison......Page 166
6.2 The Change Problem Revisited......Page 167
6.3 The Manhattan Tourist Problem......Page 172
6.4 Edit Distance and Alignments......Page 186
6.5 Longest Common Subsequences......Page 191
6.6 Global Sequence Alignment......Page 196
6.7 Scoring Alignments......Page 197
6.8 Local Sequence Alignment......Page 199
6.9 Alignment with Gap Penalties......Page 203
6.10 Multiple Alignment......Page 204
6.11 Gene Prediction......Page 212
6.12 Statistical Approaches to Gene Prediction......Page 216
6.13 Similarity-Based Approaches to Gene Prediction......Page 219
6.14 Spliced Alignment......Page 222
6.15 Notes......Page 226
Michael Waterman......Page 228
6.16 Problems......Page 230
7.1 Divide-and-Conquer Approach to Sorting......Page 246
7.2 Space-Efficient Sequence Alignment......Page 249
7.3 Block Alignment and the Four-Russians Speedup......Page 253
7.4 Constructing Alignments in Subquadratic Time......Page 257
7.5 Notes......Page 259
Webb Miller......Page 260
7.6 Problems......Page 263
8.1 Graphs......Page 266
8.2 Graphs and Genetics......Page 279
8.3 DNA Sequencing......Page 281
8.4 Shortest Superstring Problem......Page 283
8.5 DNA Arrays as an Alternative Sequencing Technique......Page 284
8.6 Sequencing by Hybridization......Page 287
8.7 SBH as a Hamiltonian Path Problem......Page 290
8.8 SBH as an Eulerian Path Problem......Page 291
8.9 Fragment Assembly in DNA Sequencing......Page 294
8.10 Protein Sequencing and Identification......Page 299
8.11 The Peptide Sequencing Problem......Page 303
8.12 Spectrum Graphs......Page 306
8.13 Protein Identification via Database Search......Page 309
8.14 Spectral Convolution......Page 311
8.15 Spectral Alignment......Page 312
8.16 Notes......Page 318
8.17 Problems......Page 321
9.1 Repeat Finding......Page 330
9.2 Hash Tables......Page 332
9.3 Exact Pattern Matching......Page 335
9.4 Keyword Trees......Page 337
9.5 Suffix Trees......Page 339
9.6 Heuristic Similarity Search Algorithms......Page 343
9.7 Approximate Pattern Matching......Page 345
9.8 BLAST: Comparing a Sequence against a Database......Page 349
9.9 Notes......Page 350
Gene Myers......Page 352
9.10 Problems......Page 356
10.1 Gene Expression Analysis......Page 358
10.2 Hierarchical Clustering......Page 362
10.3 k-Means Clustering......Page 365
10.4 Clustering and Corrupted Cliques......Page 367
10.5 Evolutionary Trees......Page 373
10.6 Distance-Based Tree Reconstruction......Page 377
10.7 Reconstructing Trees from Additive Matrices......Page 380
10.8 Evolutionary Trees and Hierarchical Clustering......Page 385
10.9 Character-Based Tree Reconstruction......Page 387
10.10 Small Parsimony Problem......Page 389
10.11 Large Parsimony Problem......Page 393
10.12 Notes......Page 398
Ron Shamir......Page 399
10.13 Problems......Page 403
11.1 CG-Islands and the “Fair Bet Casino”......Page 406
11.2 The Fair Bet Casino and HiddenMarkovModels......Page 409
11.3 Decoding Algorithm......Page 412
11.4 HMMParameter Estimation......Page 416
11.5 Profile HMMAlignment......Page 417
11.6 Notes......Page 419
David Haussler......Page 422
11.7 Problems......Page 426
12.1 The Sorting Problem Revisited......Page 428
12.2 Gibbs Sampling......Page 431
12.3 Random Projections......Page 433
12.4 Notes......Page 435
12.5 Problems......Page 436
Using Bioinformatics Tools......Page 438
Bibliography......Page 440
Index......Page 448