The computational education of biologists is changing to prepare students for facing the complex datasets of today's life science research. In this concise textbook, the authors' fresh pedagogical approaches lead biology students from first principles towards computational thinking. A team of renowned bioinformaticians take innovative routes to introduce computational ideas in the context of real biological problems. Intuitive explanations promote deep understanding, using little mathematical formalism. Self-contained chapters show how computational procedures are developed and applied to central topics in bioinformatics and genomics, such as the genetic basis of disease, genome evolution or the tree of life concept. Using bioinformatic resources requires a basic understanding of what bioinformatics is and what it can do. Rather than just presenting tools, the authors - each a leading scientist - engage the students' problem-solving skills, preparing them to meet the computational challenges of their life science careers.
Author(s): Pavel Pevzner, Ron Shamir
Publisher: Cambridge University Press
Year: 2011
Language: English
Pages: 392
Tags: Биологические дисциплины;Матметоды и моделирование в биологии;Биоинформатика;
Title......Page 5
Copyright......Page 6
Dedication......Page 7
Contents......Page 9
Extended Contents......Page 11
Who is the audience for the book?......Page 17
Why this book?......Page 18
What is in the book?......Page 20
How will this book develop?......Page 21
Contributors......Page 22
Acknowledgments......Page 23
Contributors......Page 26
Algorithm......Page 28
Polynomial and exponential complexity......Page 29
Tackling hard problems......Page 30
PART I Genomes......Page 33
1 Background......Page 35
2 Genetic variation: mutation, recombination, and coalescence......Page 38
3 Statistical tests......Page 41
4.1 Continuous phenotypes......Page 44
4.2 Genotypes and extensions......Page 46
4.3 Linkage versus association......Page 47
5.1 Sampling issues: power, etc.......Page 48
5.2 Population substructure......Page 49
5.3 Epistasis......Page 50
5.4 Rare variants......Page 51
Questions......Page 52
References......Page 53
1 Introduction......Page 55
2 The tag SNP selection problem......Page 57
3 A reduction to the set-covering problem......Page 58
4 A reduction to the integer-programming problem......Page 62
Questions......Page 65
References......Page 66
1.1 DNA sequencing and the overlap puzzle......Page 68
1.2 Complications of fragment assembly......Page 70
2.1 Historical motivation......Page 72
2.3 Eulerian and Hamiltonian cycles......Page 75
2.4 Euler's Theorem......Page 76
2.5 Euler's Theorem for directed graphs......Page 77
2.6 Tractable vs. intractable problems......Page 80
3.1 Genome assembly as a Hamiltonian cycle problem......Page 81
3.2 Fragment assembly as an Eulerian cycle problem......Page 82
3.3 De Bruijn graphs......Page 84
3.4 Read multiplicities and further complications......Page 86
4.1 The tale of three biologists: DNA chips......Page 87
5 Proof of Euler's Theorem......Page 90
Notes......Page 95
References......Page 96
1 Introduction......Page 98
2 Graphs......Page 101
3 Dynamic programming......Page 102
4 Alignment......Page 109
5 Gene recognition......Page 113
6 Dynamic programming in a general situation. Physics of polymers......Page 115
References......Page 123
1 Welcome to the Maury Povich Show!......Page 125
1.1 What makes you you......Page 126
1.2 SNPs, forensics, Jacques, and you......Page 128
2.1 The foundation: thinking about probability ``conditionally''......Page 129
2.3 Estimating disease risk......Page 132
2.4 A recipe for inference......Page 134
3 Paternity inference......Page 135
Questions......Page 140
PART 2 Gene Transcription and Regulation......Page 141
1 Introduction......Page 143
2 Cumulative skew diagrams......Page 144
3 Different properties of two DNA strands......Page 148
4 Replication, transcription, and genome rearrangements......Page 152
Discussion......Page 156
References......Page 157
1 Introduction......Page 158
2 Experimental determination of binding sites......Page 161
3 Consensus......Page 162
4 Position Weight Matrices......Page 164
5 Higher-order PWM......Page 166
6 Maximum dependence decomposition......Page 167
7 Modeling and detecting arbitrary dependencies......Page 170
8 Searching for novel binding sites......Page 171
8.2 A graph-based approach to binding site prediction......Page 172
9 Additional hallmarks of functional TF binding sites......Page 173
9.2 Modular interactions between TFs......Page 174
Discussion......Page 175
References......Page 176
1 Introduction......Page 180
2 Host switch of influenza: molecular mechanisms......Page 183
2.1 Diversity of glycan structures......Page 184
2.2 Molecular basis of the host specificity of influenza viruses......Page 187
2.3 Profiling of hemagglutinin--glycan interaction by using glycan arrays......Page 188
3 The glycan motif finding problem......Page 189
Questions......Page 193
Further reading......Page 195
References......Page 196
PART 3 Evolution......Page 197
1 Review of basic biology......Page 199
2 Distance metrics and the genome rearrangement problem......Page 203
3 Unsigned reversals......Page 207
4 Signed reversals......Page 210
5 DCJ operations and algorithms for multiple chromosomes......Page 212
Discussion......Page 218
References......Page 219
1 The crisis of the Tree of Life in the age of genomics......Page 221
2 The bioinformatic pipeline for analysis of the Forest of Life......Page 225
3.1 The NUTs contain a consistent phylogenetic signal, with independent HGT events......Page 227
3.2 The NUTs versus the FOL......Page 230
Discussion: the tree of life concept is changing, but is not dead......Page 231
References......Page 232
11 Reconstructing the history of large-scale genomic changes......Page 233
1.2 Comparative genomics......Page 234
1.3 Genome reconstruction provides an additional dimension for comparative genomics......Page 237
1.4 Base-level ancestral reconstruction......Page 238
2.1 Genome rearrangements......Page 239
2.2 Synteny blocks......Page 241
3.1 Ancestral karyotype reconstruction......Page 243
3.2 Rearrangement-based ancestral reconstruction......Page 244
3.3 Adjacency-based ancestral reconstruction......Page 245
3.4 Challenges and future directions......Page 249
4 Chromosomal aberrations in human disease genomes......Page 251
Questions......Page 253
References......Page 254
PART 4 Phylogeny......Page 257
12 Figs, wasps, gophers, and lice: a computational exploration of coevolution......Page 259
1 Introduction......Page 260
2 The cophylogeny problem......Page 261
3 Finding minimum cost reconstructions......Page 265
4 Genetic algorithms......Page 267
5 How Jane works......Page 269
6 See Jane run......Page 273
Questions......Page 277
References......Page 278
13 Big cat phylogenies, consensus trees, and computational thinking......Page 280
1 Introduction......Page 281
2 Evolutionary trees and the big cats......Page 282
2.1 Evolutionary hypotheses for the pantherine lineage......Page 283
2.2.1 Tree T1: Johnson, Dratch, Martenson, and O’Brien......Page 284
2.2.4 Tree T4: Davis, Li, and Murphy......Page 285
3 Consensus trees and bipartitions......Page 286
3.1 Phylogenetic trees and their bipartitions......Page 287
4.1 Step 1: collecting bipartitions from a set of trees......Page 288
4.2.1 Our first selection algorithm: sorting bitstrings......Page 290
4.2.2 Our second selection algorithm: using hash tables......Page 291
4.3 Step 3: constructing consensus trees from consensus bipartitions......Page 293
Questions......Page 296
References......Page 297
14 Phylogenetic estimation: optimization problems, heuristics, and performance analysis......Page 299
1 Introduction......Page 300
2 Computational problems......Page 301
2.1 The 2-colorability problem......Page 303
2.2 Maximum independent set......Page 306
3 NP-hardness, and lessons learned......Page 307
4.1 Maximum parsimony......Page 309
References......Page 318
PART 5 Regulatory Networks......Page 321
15 Biological networks uncover evolution, disease, and gene functions......Page 323
1 Interaction network data sets......Page 325
2 Network comparisons......Page 327
3 Network models......Page 332
4 Using network topology to discover biological function......Page 335
5 Network alignment......Page 338
References......Page 344
1 Introduction......Page 347
1.1 The biology of transcriptional regulation......Page 349
2.1 Abstracting the problem statement......Page 352
2.2 An intuition for network inference......Page 354
2.3 Formalizing the intuition for an inference objective function......Page 355
2.4 Generalizing to arbitrary numbers of genes......Page 364
3 Finding the best model......Page 365
4 Extending the model with prior knowledge......Page 367
5 Regulatory network inference in practice......Page 369
5.1 Real-valued data......Page 370
5.2 Combining data sources......Page 371
Discussion......Page 373
Questions......Page 374
References......Page 375
Glossary......Page 376
Index......Page 382