The Pacific Symposium on Biocomputing (PSB) 2008 is an international, multidisciplinary conference for the presentation and discussion of current research in the theory and application of computational methods in problems of biological significance. Presentations are rigorously peer reviewed and are published in an archival proceedings volume. PSB 2008 will be held on January 4-8, 2008 at the Fairmont Orchid, Big Island of Hawaii. Tutorials will be offered prior to the start of the conference.PSB 2008 will bring together top researchers from the US, the Asian Pacific nations, and around the world to exchange research results and address open issues in all aspects of computational biology. It is a forum for the presentation of work in databases, algorithms, interfaces, visualization, modeling, and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology. The PSB has been designed to be responsive to the need for critical mass in sub-disciplines within biocomputing. For that reason, it is the only meeting whose sessions are defined dynamically each year in response to specific proposals. PSB sessions are organized by leaders of research in biocomputing's "hot topics." In this way, the meeting provides an early forum for serious examination of emerging methods and approaches in this rapidly changing field.
Author(s): Russ B. Altman, A. Keith Dunker, Lawrence Hunter, Tiffany Murray, Teri E. Klein
Publisher: World Scientific Publishing Company
Year: 2008
Language: English
Pages: 684
CONTENTS......Page 13
Preface......Page 7
Session Introduction Michael Brudno, Bernard Moret, Randy Lindel; and Tandy Wamow......Page 21
1. Introduction......Page 23
2.1. Previous work......Page 24
2.2. Rectangle Scoring Schemes......Page 26
3.1. Basic FRESCO Algorithm......Page 27
3.1.1. Running Time and Resources......Page 28
3.2. FRESCO Speed Ups......Page 29
4.2. Example function & performance......Page 31
5. Discussion......Page 33
References......Page 34
1. Introduction......Page 35
2.1. Construction of the co-optimality MSA set......Page 37
2.2. Local reliability measures for MSA......Page 38
3. Results......Page 40
4. Discussion......Page 42
References......Page 43
1. Introduction......Page 45
2. Basics......Page 47
3.1. Experimental design.......Page 49
3.2. Results.......Page 52
3.3. Conclusions.......Page 54
References......Page 55
1. Introduction......Page 57
2. Methods......Page 59
2.2. Forming Synteny Blocks......Page 60
2.4. Using Multiple Genomes......Page 61
3.1.2. Reversal Distance......Page 62
3.3. Correlation of Reversal Distance and Breakpoint Reuse Rate......Page 64
3.4. Relative Divergence......Page 65
3.5. Using Multiple Genomes......Page 66
4. Conclusions......Page 67
References......Page 68
1. Introduction......Page 69
2. Session papers......Page 70
References......Page 71
1. Introduction......Page 72
2.1. A Bayesian model for gene and microRNA expression......Page 73
2.2. Incorporating sequence features......Page 75
2.3. Learning the model of gene and microRNA expression......Page 76
2.4. Setting the sequence-based priors using the posteriors from the gene and microRNA expression model......Page 77
3.1. Evaluating sequence features using cross-validation......Page 78
3.2. Evaluating sequence features using functional enrichment analysis......Page 80
4. Discussion and conclusion......Page 81
References......Page 82
Analysis of MicroRNA-Target Interactions by a Target Structure Based Hybridization Model Dang Long, Chi Yu Chan, and Ye Ding......Page 84
1. Introduction......Page 85
2.1 mRNA Secondaty Structure Prediction......Page 86
2.2 Two-step Hybridization Model......Page 87
3.1 Analysis of Interaction between Mammalian miRNAs and Viral Genomes......Page 88
3.2 Analysis of Other MicroRNA-Target Interactions......Page 90
4. Conclusion......Page 92
References......Page 93
1. Introduction......Page 95
2. The 454 Pyrosequencing Method......Page 97
3. Flowgram Matching......Page 99
4. Enhanced Suffix Arrays......Page 100
5. Lookahead Scoring......Page 101
6. Statistical Significance of Scores......Page 102
8. Experiments......Page 103
References......Page 106
Session Introduction Francisco M. De La Vega, Gabor T. Marth, and Granger Sutton......Page 107
1. Introduction......Page 110
2. Preliminary Concepts......Page 111
3. The Basic Trellis+ Approach......Page 112
4.1. Larger Segment Size......Page 113
4.2.1. Edge Index Shifling......Page 114
4.2.4. Leaf Edge Label Encoding......Page 116
5. Experiments......Page 117
5.1. Effect of Larger Segment Size......Page 118
5.2. Effect of String Buffer......Page 119
5.3. Query Times......Page 120
References......Page 121
1. Introduction......Page 122
2.1. The seed-and-extend paradigm......Page 123
2.2. Positional Hashing specifically addresses the anchoring problem......Page 124
3.1. Multidiagonal collation......Page 126
4.1. UD-CSD benchmark......Page 127
4.2. Simulated Anchoring of WGS reads......Page 130
4.3. Anchoring of mate pairs......Page 131
5. Conclusions......Page 132
References......Page 133
1. Introduction......Page 134
2. A statistical model of short sequence readouts from multiple related strains......Page 136
3. Strain reconstruction as probabilistic inference and learning......Page 138
4. Computational cost and local minima issues......Page 141
5. Experimental validation......Page 142
References......Page 145
1. Introduction......Page 146
2.1. Sequence Processing......Page 147
2.1.3. Mapping to the genome......Page 148
2.3. Genome View......Page 149
2.4. Nucleotide bias......Page 151
2.6. Clusters on the genome......Page 152
References......Page 153
Introduction......Page 157
References......Page 160
1. Introduction......Page 161
2. Integrative data structure design in Bioconductor......Page 163
3. Sample annotation; ontoElicitor......Page 165
5. The integrated interface; use cases......Page 168
6. Deployment; conclusions......Page 170
References......Page 172
1. Introduction......Page 173
2. Models......Page 175
3. Methods and Features......Page 178
4. Results and Discussion......Page 181
References......Page 184
1. Introduction......Page 186
2. Data sources......Page 188
3.1. Kernel methods and LS-SVMs......Page 189
3.2. Data fusion......Page 190
3.3. Model building......Page 191
4. Results......Page 193
5. Discussion......Page 195
References......Page 197
1.1. Challenges in Data Integration......Page 198
1.3. Objective......Page 199
2.1. Molecular Characteristics of Developing Mouse Prostate......Page 200
2.4. Gene-Ontology and KEGG pathways data......Page 201
2.5. Overall data summary......Page 202
3.2. Characterizing gene-expression behavior within an experiment:......Page 203
3.5. Overall analysis setup......Page 204
4.1. Analysis of time course data......Page 205
4.2. Integrated analyses of multiple experimental data sources......Page 206
5. Discussion......Page 208
References......Page 209
1. Introduction......Page 210
2.1. Univuriute Screening......Page 211
2.2. Random Handfuls......Page 212
3. Results......Page 216
4. Discussion......Page 218
References......Page 219
1. Introduction......Page 221
2. Related work......Page 223
3.1.2. Correlation heatmaps......Page 224
3.1.3. Statistical significance......Page 225
3.2. Uncorrelated data: H3Kdme2 vs. H3K27me3......Page 227
4. Discussion and conclusions......Page 229
5. Methods......Page 230
References......Page 231
1. Introduction......Page 236
2.1. Sample collection......Page 238
2.3. Global peak selection......Page 239
2.4. Identification of subgroup-specific peaks......Page 241
3. Results......Page 242
References......Page 246
Session Introduction Yves A. Lussiel; Younghee Lee, Predrag Radivojac, Yanay Ofran, Marco Punta, Atul Butte, and Maricel Kann......Page 248
Acknowledgements......Page 250
System-Wide Peripheral Biomarker Discovery Using Information Theory Gil Alterovitz, Michael Xiang, Jonathan Liu, Amelia Chang, and Marco E Ramoni......Page 251
1. Introduction......Page 252
2. Methods......Page 253
3.1 Significant Tissue-Biofluid Channels......Page 255
3.2 Identification of Candidate Biomarkers......Page 257
3.3 Establishing Biomarker Quality......Page 259
4. Discussion and Conclusion......Page 260
References......Page 261
1. Introduction......Page 263
2.1. Data Collection and Processing......Page 265
2.2. Finding Biomarkers for Maturation Using Analysis of Variance......Page 266
2.3. Using Diseases to Model Maturation......Page 267
3.1. Clinical Biomarkers for Maturation......Page 268
3.2. Finding Maturation and Aging Related Genes......Page 269
4. Discussion......Page 272
Acknowledgements......Page 273
References......Page 274
1. Introduction......Page 275
2.1. Microarray data sources......Page 277
2.3. Pathway/gene set activity level and coordination network......Page 278
2.5. Disease-relevant pathways and linking pathways......Page 279
3.1. Identified obesity-relevant biological pathway/gene sets......Page 280
3.3. Identification of association between obesity and NIDDM by networking pathways......Page 281
4. Discussion and Conclusion......Page 284
References......Page 285
1 lntroduction and motivation......Page 287
2.1 The pancreatic ductal adenocarcinorna dataset......Page 288
3 Simultaneous Nonnegative Matrix Factorization with offset......Page 289
4 Nonnegative Matrix Factorizations with offset......Page 291
5 Simultaneous factorization of the PDAC and colon adenocarcinoma datasets......Page 293
6 Conclusions and related work......Page 296
References......Page 298
1. Introduction......Page 299
2. Bayesian networks......Page 300
2.1. Model building......Page 301
2.2. Structure prior......Page 302
3.1. Gene prior......Page 303
3.2. Class variable prior......Page 304
4. Data......Page 305
6.1. Veer data......Page 306
7. Conclusions......Page 308
References......Page 310
1. Introduction......Page 311
2. Problem definition......Page 312
3. Proposed methods......Page 313
3.1. State space and basic search strategy of OPMET......Page 314
3.2. Improving the OPMET algorithm by enzyme prioritization......Page 315
3.2.2. Ordering of enzymes in OPMET......Page 316
4. Experimental results......Page 317
5. Related Work......Page 320
References......Page 321
1. Introduction......Page 323
2. Problem Formulation......Page 325
3. Algorithm......Page 327
4. Results......Page 330
4.1. Functional Coherence: Evaluating Orthology Predictions......Page 331
5. Conclusion......Page 333
References......Page 334
Predicting DNA Methylation Susceptibility Using CpG Flanking Sequences S. Kim, M. Li, H. Paik, K. Nephew, H. Shi, R. Kramer; D. Xu, and T-H. Huang......Page 335
1. Introduction......Page 336
2. Related work and Motivation......Page 337
4. Data......Page 338
5.1. Estimating methylation level of a CpG site......Page 340
5.3. Character composition analysis......Page 341
6.2. Analysis of character composition......Page 342
6.4. Is this cancer specific?......Page 343
7. Discussion......Page 344
References......Page 345
1. Session Background and Motivation......Page 347
2. Session Summary......Page 348
References......Page 351
1. Introduction:......Page 352
2.1 Structures:......Page 353
2.3 FEATURE Scanning:......Page 354
3.2 FEATURE Scanning:......Page 355
4. Discussion:......Page 356
6. References:......Page 362
1. Introduction......Page 364
2.2 Prediction of the 3D structure:......Page 365
2.5 HierDock Method: Scan the entire receptor for binding sites......Page 366
3.1 Predicted human DP structure and binding modes with PGD2 and antagonists......Page 367
1. Introduction......Page 374
2. Formulation of Model......Page 376
2.1. Instantaneous Coupling of Two Ca2+ -Regulated Ca2+ Channels......Page 377
3. Stationary Distribution Calculations......Page 378
4. Results......Page 379
4.2. Problem Size and Method Performance......Page 380
4.3. Comparison of Iterative Methods and Monte Carlo Simulation......Page 382
5. Conclusions......Page 383
Acknowledgments......Page 384
References......Page 385
Spatially-Compressed Cardiac Myofilament Models Generate Hysteresis that Is Not Found in Real Muscle John Jeremy Rice, Yuhai Tu, Corrado Poggesi, and Pieter P. De Tombe......Page 386
2. Method......Page 387
3.1. Pseudo-steady-state solution......Page 389
3.2. True steady-state solution......Page 390
3.3. Comparison to other models......Page 391
4.1. Implications of modeling results......Page 392
4.2. Experimental evidence of hysteresis......Page 393
References......Page 396
1. Introduction......Page 398
2.1. Model design......Page 400
2.2. Simulation methods......Page 402
3. Results......Page 403
4. Discussion......Page 406
Acknowledgments......Page 408
References......Page 409
1. Introduction......Page 410
2.1. Preparation and Imaging of Cardiomyocytes......Page 412
2.2. Image Processing......Page 413
2.3. Surface Mesh Generation and Visualization......Page 414
3. Results......Page 415
4. Discussion and Conclusions......Page 419
References......Page 420
1. Introduction......Page 422
2.1. Circadian rhythms......Page 424
3. Oscillators and PPV phase macromodels......Page 425
3.3. Parameterized PPV macromodels......Page 426
4.1. Time-course simulations usirzg full ODE models......Page 427
4.2. Circadian PPV macromodels......Page 428
4.3.1. Mammalian clock model......Page 429
4.5. Parunreter variation sitiiulations......Page 430
4.6. Synchronization of coupled oscillators......Page 431
References......Page 432
1. Semantics for biosimulation modeling......Page 434
1.1 Motivating use-case: Arteriolar calcium uptake & heart rate......Page 435
2. Semantic annotation via ontologies......Page 437
2.1 Reference ontologies: FMA and OPB......Page 438
2.2 The application model ontology......Page 439
3. Comparing and merging models......Page 441
4. Status and Results......Page 442
5. Discussion and future work......Page 444
References......Page 445
1. Introduction......Page 446
2.1. Protein Family Selection......Page 447
2.2. GNM......Page 448
2.3. Quantitative Mode Comparisons......Page 450
3.1. Classification of Protein Families by Dynamics......Page 451
3.2. Identification of Dynamically Similar Proteins......Page 454
3.3. Intrafamily Distinctions......Page 455
4. Conclusions......Page 456
References......Page 457
Session Introduction Martha L. Bulyk, Alexander J. Hartemink, Ernest Fraenkel, and Yael Mandel-Gutfreund......Page 458
Acknowledgments......Page 460
1. Introduction......Page 461
2.1. Data Sets Used in This Study......Page 462
2.2. Statistical Approaches......Page 463
3.1 Functional Enrichment by TF Structural Class......Page 465
3.3 Regulatory Bottlenecks......Page 468
3.5 General vs. Specific Regulation......Page 469
4. Conclusions and Future Directions......Page 470
References......Page 472
1. Introduction......Page 473
2.1. TF binding data......Page 474
2.2. DNA duplex stability data......Page 475
2.3. Average destabilization energy at TF binding sites vs. random sites......Page 476
2.4. The PRIORITY framework......Page 477
2.5. Building an energy-based positional prior......Page 478
2.6. Building a discriminative energy-based positional prior......Page 479
3. Results......Page 481
3.1. Energy-based priors perform better than uniform prior......Page 482
4. Discussion......Page 483
References......Page 484
1. Introduction......Page 485
2.2. Estimation via EM......Page 487
2.3. Other Models......Page 488
3.1. E coli data......Page 489
3.2. Analysis......Page 491
3.3. Simulation......Page 493
4. Discussion......Page 494
References......Page 495
1. Introduction......Page 497
2.2. Features......Page 498
2.4. Corrections for Small Sample Size......Page 501
3.1. Single Features......Page 502
3.2. Joint Features......Page 503
4. Discussion......Page 505
References......Page 507
1. Abstract......Page 509
2. Introduction......Page 510
3.1. Definition and training of the affinity-threshold model......Page 511
3.2. The amnity-threshold model well describes binding site substitution rates......Page 512
3.3. The affinity-threshold model predicts extant score distributions for most factors......Page 513
3.4. Stringent- and lenient-threshold binding sites have distinct patterns of local evolution......Page 514
4. Conclusion......Page 517
5.2. Simulation of the amnity-threshold model......Page 518
References......Page 519
1. Introduction......Page 521
2.1 Datasets......Page 523
2.2 Algorithms for predicting interfacial residues......Page 524
2.4 Experimental identification of RNA and DNA binding residues......Page 525
3.2 Sequence-basedprediction of RNA and DNA binding sites in human and Tetrahymena TERT......Page 526
3.3 Structural modeling of N-terminal domain of TERTfrom human and yeast......Page 528
3.4 Analysis of RNA and DNA binding surfaces in human and Tetrahymena TEN domains......Page 529
5. Acknowledgements......Page 531
References......Page 532
Session Introduction Srinka Ghosh and Antonio Piccolboni......Page 533
1. Background......Page 535
2. Methods......Page 536
3. Simulation studies......Page 538
3.1. Results of simulations I and II......Page 539
4. Case study: ZNF217 ChIP-chip data......Page 540
5. Discussion......Page 541
References......Page 542
1. Introduction......Page 547
2.2. Sequence Quantile Normalization (SQN)......Page 549
2.3. Transcript normalization techniques......Page 550
3. Transcript Identification......Page 552
4.1. Probe Normalization......Page 553
4.2. Exon Probe Identification......Page 555
4.3. Identification of Transcripts......Page 556
References......Page 557
1.1. Large-Scale Data Storage in Bioinforrnatics......Page 559
1.2. A Desktop Analysis Client and a Networked Database Server......Page 560
2.2. Coordinate Independent ChIP-chip Representation......Page 561
2.3. Discovering Binding Events from ChIP-chip Data......Page 562
2.4. Prior Work and Performance......Page 563
3.1. Ease of Integration and Extensibility......Page 565
4. Representing and Storing ChIP-chip Binding Hypotheses......Page 566
5. Conclusion......Page 568
References......Page 569
1. Introduction......Page 571
2.2. Usability......Page 572
2.5. Other topics......Page 573
References......Page 574
1. Introduction......Page 576
3. Assisted Curation......Page 577
4. TXM Pipeline......Page 579
5.1. Manual versus Assisted Curation......Page 581
5.2. NLP Consistency......Page 583
5.3. Optimizing for Precision or Recall......Page 584
6. Discussion and Conclusions......Page 585
Acknowledgements......Page 586
References......Page 587
1. Introduction......Page 588
2. The User-Centered Design Process......Page 589
3. Research on Term Suggestions Usability......Page 590
5. First Questionnaire: Biological Information Preferences......Page 591
6.1. The Evaluated Designs......Page 594
6.2. Results......Page 596
7. Conclusions and Future Work......Page 598
References......Page 599
1.1. The Role of Text Mining in Translational Bioinforrnatics......Page 600
1.2. Objective and Approach......Page 601
2.1. Identvying Disease-Related Experiments......Page 604
2.3. Handling Negation......Page 605
2.4. Handling Lexical Variations......Page 606
2.6. Evaluating Clinical Impact from Mortality Data......Page 607
4. Discussion......Page 608
Conclusion......Page 609
References......Page 610
1. Introduction......Page 612
2.1 Coding Drug Company Research Requests by Subject and Resource......Page 613
2.2 Query Analysis......Page 616
2.3 Where Does Text Mining Fit In?......Page 618
3.1. Impact of Assistance on Research Requests......Page 619
3.2. Research Request Subjects and Resources: Why Are Questions Asked?......Page 620
3.4. Requests in the Future......Page 621
References......Page 622
1. Introduction......Page 624
2. Data and Methods......Page 625
3.1. The Z-Test Method......Page 628
3.2. Feature Selection Comparison......Page 629
4. Experimental Setting......Page 630
5. Results and Discussion......Page 632
6. Conclusion and Future Directions......Page 634
References......Page 635
1. Introduction......Page 636
2.1. Goal Oriented Evaluation, Module Selection and Inter-operability......Page 638
2.2.2. Components and Component Descriptors......Page 639
3.1. Shared Type System......Page 640
3.2. General Combinatorial Comparison Generator......Page 641
4.1. Combinatorial Comparison......Page 644
5. Conclusion and Future Work......Page 645
References......Page 646
1. Introduction......Page 648
2. Related Work......Page 649
3. Matching Techniques......Page 650
3.2. String Similarity Measures......Page 651
4.2. Experimental Setup......Page 652
4.3. Results and Discussion......Page 653
5.1. Experimental Setup......Page 654
5.2. Results and Discussion......Page 655
6. Conclusions and Future Work......Page 657
References......Page 658
1. Introduction......Page 660
2.1. Evaluation of manual annotations......Page 662
2.1.3. Accuracy of manually deposited mutation annotations......Page 663
2.1.5. Results of manual annotation evaluation......Page 664
2.2.2. Alignment-based mutation annotation......Page 665
2.3. MutationFinder applied to abstracts versus full-text......Page 666
2.3.3. Abstract versus full text results......Page 667
3.2. MutationFinder: intrinsic versus extrinsic evaluations......Page 668
3.3. Alignment-based mutation annotation: extrinsic evaluation......Page 669
4. Discussion......Page 670
References......Page 671
1. Introduction......Page 672
2. Background......Page 673
3. Architecture......Page 675
4. Analysis......Page 678
5. Comparison......Page 679
6. Conclusion & Future Work......Page 681
References......Page 682