Bioinformatics: A Swiss Perspective

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Biological research and recent technological advances have resulted in an enormous increase in research data that require large storage capacities, powerful computing resources, and accurate data analysis algorithms. Bioinformatics is the field that provides these resources to life science researchers.

The Swiss Institute of Bioinformatics (SIB), which has celebrated its 10th anniversary in 2008, is an institution of national importance, recognized worldwide for its state-of-the-art work. Organized as a federation of bioinformatics research groups from Swiss universities and research institutes, the SIB provides services to the life science community that are highly appreciated worldwide, and coordinates research and education in bioinformatics nationwide. The SIB plays a central role in life science research both in Switzerland and abroad by developing extensive and high-quality bioinformatics resources that are essential for all life scientists. Knowledge developed by SIB members in areas such as genomics, proteomics, and systems biology is directly transformed by academia and industry into innovative solutions to improve global health. Such an astounding concentration of talent in a given field is unusual and unique in Switzerland.

This book provides an insight into some of the key areas of activity in bioinformatics in Switzerland. With contributions from SIB members, it covers both research work and major infrastructure efforts in genome and gene expression analysis, investigations on proteins and proteomes, evolutionary bioinformatics, and modeling of biological systems.

Author(s): Ron D. Appel, Ernest Feytmans, Ron D Appel
Edition: 1
Publisher: World Scientific Publishing Company
Year: 2009

Language: English
Pages: 464

Contents......Page 10
Foreword......Page 6
Preface......Page 8
List of Contributors......Page 14
SECTION I GENES AND GENOMES......Page 22
1. Introduction......Page 24
1.1. Motif Discovery from a Biological Perspective......Page 26
1.2. DNA Motifs from a Physical Perspective......Page 27
2. Motif Discovery in a Nutshell......Page 29
3.1. Objective Functions......Page 31
3.2.1. Finding the best consensus sequence......Page 34
3.2.2. Optimizing a base probability matrix......Page 35
3.2.3. Optimizing the motif annotation......Page 36
3.2.5. Estimating the significance of a newly discovered motif......Page 38
4.1. Benchmarking Procedures for Motif Discovery......Page 39
4.2. Why is Protein Domain Discovery Easier?......Page 41
4.3. Reasons for the Limited Success of DNA Motif Discovery......Page 42
5.1. Modification of the Problem Statements......Page 43
5.2. Search Algorithms for Locally Overrepresented Sequence Motifs......Page 46
6. Conclusions and Perspectives......Page 48
References......Page 50
1. Introduction......Page 54
2. Gene Annotation......Page 55
3. Protein Families......Page 58
4. Orthologs and Paralogs......Page 61
5. Genome Architecture......Page 67
6. RNA Genes and Conserved Noncoding Sequences......Page 71
References......Page 74
1. Introduction......Page 80
1.1. The Modular Concept......Page 82
1.2. Regulatory Patterns are Context-Specific......Page 83
1.3. Coclassification of Genes and Conditions......Page 84
1.5. From Modules to Models......Page 85
1.6. Data Integration......Page 86
2.1. Signature Algorithm......Page 87
2.2. Comparative Analysis......Page 89
2.3. Differential Clustering Algorithm......Page 91
2.4. Ping-Pong Algorithm......Page 94
3.1. Module Annotation......Page 95
3.2. Module Visualization......Page 96
4. Outlook......Page 99
References......Page 100
1. Introduction......Page 106
2. Combining Information......Page 107
4.1. Difficulties with Public Data Sources......Page 108
4.3. SwissBrod Data Curation......Page 110
5.1. Pooling Raw Data......Page 111
5.3. Combining Parameter Estimates......Page 112
5.4. Combining Test Statistics......Page 113
5.6. Combining Statistic Ranks......Page 114
5.7. Combining Decisions......Page 115
6.2. Data Cleaning......Page 116
6.4. Outcome Modeling......Page 117
6.5. Z-Transform for Combining Test Statistics......Page 118
6.6. Multiple Testing......Page 119
7. Breast Cancer Examples......Page 120
7.1. Example I: Breast Cancer Survival......Page 122
7.2. Example II: Breast Cancer Gene Signatures......Page 124
7.2.2. Model for identifying coexpression modules......Page 126
7.2.3. Coexpression patterns......Page 127
8. Conclusion......Page 130
References......Page 131
1. Introduction......Page 136
2. Identification of Small Regulatory RNAs......Page 137
3.1. miRNAs......Page 138
3.2 . Piwi-Interacting RNAs......Page 140
4.2. Biogenesis of piRNAs......Page 141
5. Function of Small Regulatory RNAs......Page 142
6.1.1. Stable hairpin precursors......Page 143
6.1.2. Structural constraints......Page 144
6.1.5 . miRNA gene prediction methods......Page 145
7.2. The miRNA Seed Region......Page 151
7.3. Structural Determinants......Page 152
7.4. Other Determinants......Page 153
7.5. miRNA Target Prediction Methods......Page 154
8. Conclusions......Page 157
References......Page 158
SECTION II PROTEINS AND PROTEOMES......Page 168
1.1. What is UniProtKB?......Page 170
1.2. What is Dictyostelium discoideum?......Page 171
1.4. The Dictyostelium Annotation Project at UniProt......Page 172
2.1. Creating a Complete Proteome Set Across Swiss-Prot and TrEMBL......Page 174
2.2. UniProtKB/Swiss-Prot Annotation......Page 175
2.2.1. Sequence annotation......Page 176
2.2.2. Sequence feature annotation......Page 177
2.2.3. Nomenclature annotation......Page 182
2.2.4. Functional annotation......Page 184
2.3. Why are DictyBase and UniProtkb Complementary?......Page 186
3.2. Future of Dictyostelium Annotation in Swiss-Prot......Page 187
Acknowledgments......Page 188
References......Page 189
1. Introduction......Page 190
2. Proteome Imaging......Page 192
2.1. 2-DE Gel Imaging......Page 193
2.2. LC/MS Imaging for Label-Free Quantitation......Page 196
3.1. PMF......Page 199
3.2. PFF......Page 200
3.3. Identification Platforms — SwissPIT......Page 202
4.1. Standards for High-Throughput Data......Page 204
4.2. Integrative Proteomics Data......Page 205
4.2.1. Proteomics servers......Page 206
4.2.2. Proteomics repositories......Page 208
4.2.3. Proteomics integrated resources......Page 209
5. Conclusion......Page 211
Acknowledgments......Page 212
References......Page 213
1. Introduction......Page 218
2. Experimental Protein Interaction Data......Page 219
3. Networks That Include Indirect Associations......Page 224
4. Clustering, Modules, and Motifs......Page 226
5. Interpreting Network Topology......Page 230
6. Online Resources......Page 232
References......Page 234
1. Introduction......Page 240
2. Protein Structure Prediction with SWISS-MODEL — Methods and Tools......Page 241
2.2. SWISS-MODEL Template Library......Page 242
2.4. SWISS-MODEL Repository......Page 243
2.5. SWISS-MODEL and DeepView — Swiss-PdbViewer......Page 244
3. Large-Scale Protein Structure Prediction and Structural Genomics......Page 245
4.1.1. Model correctness......Page 246
4.1.2. Model accuracy......Page 247
4.2.1. Template availability and structural diversity......Page 248
4.2.3. Membrane proteins......Page 249
4.3. Model Quality Evaluation......Page 250
5.1. Functional Analysis of Proteins......Page 252
5.1.1. Studying the impact of mutations and SNPs on protein function......Page 253
5.2. Molecular Replacement......Page 254
5.4. Docking......Page 256
6. Protein Model Portal......Page 257
7. Future Outlook......Page 258
References......Page 259
1. Introduction......Page 268
2. Molecular Force Fields......Page 269
2.2. The CHARMM Force Field......Page 270
3.1. Integration of the Equation of Motion......Page 273
3.2. Thermodynamic Ensembles......Page 276
4. Free Energy Calculations......Page 279
4.1. Exact Statistical Mechanics Methods for Free Energy Differences......Page 281
4.1.1. Free energy perturbation......Page 282
4.1.2. Thermodynamic integration......Page 283
4.2. Relative Free Energy Differences from Thermodynamic Cycles......Page 284
4.3. Endpoint Methods......Page 285
5.1. Protein Design......Page 291
5.2. Drug Design......Page 293
References......Page 297
SECTION III PHYLOGENETICS AND EVOLUTIONARY BIOINFORMATICS......Page 304
1. Introduction......Page 306
2.1. Characters and Their States......Page 309
2.2. Homology — A Phylogenetic Hypothesis......Page 311
2.3. Ancestral or Derived — Qualifying the State of a Character......Page 312
2.4. Homoplasy — Pitfall in Phylogenetics......Page 313
3.1. Gene Duplication vs. Speciation, Paralogy vs. Orthology......Page 316
3.2. Sequence Alignment — A Homology Hypothesis......Page 319
3.3.1. Evolutionary distance and the course of time......Page 323
3.3.3. Micromutations and the molecular clock......Page 324
4.1. The Tree Graph Model — Transmission of Phylogenetic Information......Page 326
4.2. Numerical Taxonomic Phenetics (NTP)......Page 327
4.2.2. A common NTP artefact......Page 328
4.3.1. Symplesiomorphy, synapomorphy, and autapomorphy......Page 329
4.3.3. CMP common artefacts......Page 331
4.4. Probabilistic Methods......Page 333
4.5.1. The number of possible phylogenetic trees......Page 334
4.5.2. The branch-and-bound algorithm......Page 335
4.5.4. A rapid maximum likelihood method: RAxML......Page 337
4.5.5. Creating consensus trees......Page 338
4.5.6. Estimating tree robustness......Page 339
4.6. Recommended, Acceptable, and Unacceptable Groupings of Taxons......Page 341
4.6.1. Paraphylon — Acceptable with caveat......Page 342
4.6.2. Convergence and reversion polyphylons — Unacceptable......Page 343
5.1. Prediction of Gene Function......Page 344
5.2. Advanced Phylogenetic Analyses and New Directions......Page 346
References......Page 347
1. Introduction......Page 350
2. Types of Tree Building Methods......Page 351
3. Tree Building with an Objective Function......Page 359
3.1. Constructors......Page 360
3.2.1. n-optim......Page 361
3.2.3. Tree bisection reconnection (TBR)......Page 362
3.2.5. Tree fusing (TF)......Page 363
4.1. Least Squares Tree Function in Darwin......Page 364
4.2.2. Results......Page 367
4.3.1. Subtree index......Page 369
5. Outlook......Page 371
References......Page 372
1. Introduction......Page 376
2.1. Rapid Overview of Bioinformatics Involved......Page 378
2.2.1. Why don’t flies have retinoic acid receptors?......Page 379
2.2.2. Why do humans have three retinoic acid receptors?......Page 381
2.2.3. Biased gene loss after whole-genome duplication......Page 383
3.1. Defining Homology for Bioinformatics......Page 387
3.2. Modeling Homology Relationships......Page 389
3.3. Bgee, a Database for Gene Expression Evolution......Page 391
4. Conclusion......Page 393
References......Page 394
SECTION IV MODELING OF BIOLOGICAL SYSTEMS......Page 400
1. Introduction......Page 402
2. Properties of Biological Systems......Page 404
2.1. Dimensionality and Degrees of Freedom......Page 405
2.2. Regulation......Page 406
2.4. Nonlinearity......Page 407
2.5. Coupling Across Scales......Page 408
2.7. Nonequilibrium......Page 409
3.2. Discrete vs. Continuous Models......Page 410
3.3. Stochastic vs. Deterministic Models......Page 413
4.1. Methods for Discrete Stochastic Models......Page 414
4.3. Methods for Continuous Stochastic Models......Page 415
4.4. Methods for Continuous Deterministic Models......Page 416
4.5. Representing Complex Geometries in the Computer......Page 417
5. Introduction to Continuum Particle Methods......Page 418
5.1. Function Approximation by Particles......Page 421
5.2.1. Pure particle methods......Page 423
5.2.2. Hybrid particle-mesh (PM) methods......Page 424
6. Efficient Algorithms for Particle Methods......Page 427
6.1. Fast Algorithms for Short-Range Interactions......Page 428
6.2. Fast Algorithms for Long-Range Interactions......Page 430
7. Particle Methods for the Simulation of Diffusion Processes......Page 431
7.1. The Method of Random Walk (RW)......Page 433
7.2.1. PSE for isotropic diffusion......Page 434
7.2.2. Boundary conditions in PSE......Page 437
7.3. Comparison of PSE and RW......Page 438
8. Reaction-Diffusion Processes......Page 440
8.1. Previous Approaches and Applications in Biology......Page 441
8.2.1. An example with moving reaction fronts......Page 442
9. Conclusions......Page 443
References......Page 445
Index......Page 454