New sequencing technologies have broken many experimental barriers to genome scale sequencing, leading to the extraction of huge quantities of sequence data. This expansion of biological databases established the need for new ways to harness and apply the astounding amount of available genomic information and convert it into substantive biological understanding. A complilation of recent approaches from prominent researchers, Bioinformatics: High Performance Parallel Computer Architectures discusses how to take advantage of bioinformatics applications and algorithms on a variety of modern parallel architectures. Two factors continue to drive the increasing use of modern parallel computer architectures to address problems in computational biology and bioinformatics: high-throughput techniques for DNA sequencing and gene expression analysis—which have led to an exponential growth in the amount of digital biological data—and the multi- and many-core revolution within computer architecture. Presenting key information about how to make optimal use of parallel architectures, this book: Describes algorithms and tools including pairwise sequence alignment, multiple sequence alignment, BLAST, motif finding, pattern matching, sequence assembly, hidden Markov models, proteomics, and evolutionary tree reconstruction Addresses GPGPU technology and the associated massively threaded CUDA programming model Reviews FPGA architecture and programming Presents several parallel algorithms for computing alignments on the Cell/BE architecture, including linear-space pairwise alignment, syntenic alignment, and spliced alignment Assesses underlying concepts and advances in orchestrating the phylogenetic likelihood function on parallel computer architectures (ranging from FPGAs upto the IBM BlueGene/L supercomputer) Covers several effective techniques to fully exploit the computing capability of many-core CUDA-enabled GPUs to accelerate protein sequence database searching, multiple sequence alignment, and motif finding Explains a parallel CUDA-based method for correcting sequencing base-pair errors in HTSR data Because the amount of publicly available sequence data is growing faster than single processor core performance speed, modern bioinformatics tools need to take advantage of parallel computer architectures. Now that the era of the many-core processor has begun, it is expected that future mainstream processors will be parallel systems. Beneficial to anyone actively involved in research and applications, this book helps you to get the most out of these tools and create optimal HPC solutions for bioinformatics.
Author(s): Bertil Schmidt
Edition: 1
Publisher: CRC Press
Year: 2010
Language: English
Pages: 370
Tags: Биологические дисциплины;Матметоды и моделирование в биологии;Биоинформатика;
Cover Page......Page 1
Embedded Multi-Core Systems......Page 3
Title Page......Page 4
ISBN 9781439814888......Page 5
Contents......Page 6
Preface......Page 8
Editor......Page 11
Contributors......Page 12
1.1 Introduction......Page 14
1.2.1 Definitions and Notations......Page 15
1.2.2 DP for Optimal Pairwise Alignment with Linear Gap Penalty Function......Page 19
1.2.3 DP for Optimal Pairwise Alignment with Affine Gap Penalty Function......Page 23
1.2.4 Computing Alignments in Linear Space Using Divide and Conquer......Page 25
1.3.1 Background......Page 27
1.3.2 Progressive Alignment......Page 31
1.4.1 Filtration......Page 35
1.4.2 Suffix Trees and Suffix Arrays......Page 37
1.5 References......Page 39
2.1 Introduction......Page 42
2.2 Massive Multithreading Is the Key......Page 44
2.3 CUDA Simplifies the Creation of Massively Threaded Software......Page 48
2.3.1 Step 1: Getting (and Keeping) the Data on the GPU......Page 51
2.3.2 Step 2: Maximizing the Amount of Work Performed per Call to the GPU......Page 52
2.3.3 Step 3: Exploiting Internal Resources on the GPU......Page 54
2.4 Visualization......Page 58
2.5 Conclusion......Page 59
2.6 References......Page 60
3.1 Introduction......Page 62
3.2 The Need for FPGA Computing......Page 63
3.3 FPGA Computing Architectures......Page 65
3.4 FPGA Development Tools......Page 67
3.6 References......Page 69
4. Parallel Algorithms for Alignments on the Cell BE......Page 72
4.1 Computing Alignments......Page 74
4.3 A Parallel Communication Scheme......Page 76
4.3.1 Tiling Scheme for Aligning Longer Sequences......Page 78
4.3.2 Computing the Optimal Alignment Score Using Tiling......Page 79
4.3.3 Computing an Optimal Alignment Using Tiling......Page 80
4.4.1 Parallel Alignment Scheme Using Prefix Computations......Page 81
4.4.2 Problem Decomposition Using Wavefront Scheme......Page 83
4.4.3 Subproblem Alignment Phase Using Hirschberg's Technique......Page 85
4.4.4 Further Optimizations: Vectorization and Memory Management......Page 86
4.4.6 Performance of the Hybrid Algorithm......Page 87
4.5.1 Spliced Alignments......Page 89
4.5.2 Performance of Parallel Spliced Alignment Algorithm......Page 91
4.5.3 Syntenic Alignments......Page 92
4.5.4 Performance of Parallel Syntenic Alignment Algorithm......Page 93
4.7 References......Page 95
5. Orchestrating the Phylogenetic Likelihood Function on Emerging Parallel Architectures......Page 98
5.1 Phylogenetic Inference......Page 99
5.2 The Phylogenetic Likelihood Function......Page 101
5.2.1 Avoiding Numerical Underflow......Page 105
5.2.2 Memory Requirements......Page 107
5.2.3 Single or Double Precision?......Page 108
5.3 Parallelization Strategies......Page 110
5.3.2 General Fine-Grain Parallelization......Page 111
5.3.3 The Real World: Load Balance Issues......Page 116
5.4 Adaptations to Emerging Parallel Architectures......Page 120
5.5 Future Directions......Page 123
5.6 References......Page 125
6.1 Introduction......Page 130
6.2.1 Hybrid Computing Framework......Page 131
6.2.2 Intertask and Intratask Parallelization......Page 132
6.2.4 Coalesced Global Memory Access......Page 133
6.3 SW Database Search......Page 134
6.4 Multiple Sequence Alignment......Page 136
6.5 Motif Discovery......Page 143
6.6 Conclusion......Page 148
6.7 References......Page 149
7.1 Introduction......Page 152
7.2 Spectral Alignment Approach to Error Correction......Page 154
7.3.1 Bloom Filter Data Structure and Spectrum Computation......Page 156
7.3.2 Parallel Error Correction Using CUDA......Page 158
7.3.3 Execution Example......Page 160
7.4 Performance Evaluation......Page 162
7.5 Conclusion and Future Work......Page 167
7.6 References......Page 168
8. FPGA Acceleration of Seeded Similarity Searching......Page 170
8.1 The BLAST Algorithm......Page 174
8.1.1 Seed Generation......Page 175
8.1.2 Ungapped Extension......Page 176
8.1.4 Execution Profile of the BLAST Algorithm......Page 177
8.2 A Streaming Hardware Architecture for BLAST......Page 178
8.2.1 Seed Generation......Page 179
8.2.2 Ungapped Extension......Page 183
8.2.3 Gapped Extension......Page 185
8.3 Results......Page 187
8.3.1 BLASTP......Page 188
8.3.2 BLASTN......Page 189
8.4 Conclusions......Page 190
8.5 References......Page 192
9.1 Introduction......Page 194
9.2.1 Overview......Page 197
9.2.2 Bank Indexing......Page 199
9.2.4 Gap Extension......Page 201
9.2.5 Generic Hardware Implementation......Page 202
9.3 Parallelization......Page 203
9.3.2 UNGAP Parallelization on FPGA......Page 204
9.3.3 SMALL GAP Parallelization on GPU......Page 206
9.4.2 FPGA Platform......Page 207
9.4.4 Comparison of the Execution Times......Page 208
9.4.5 GPU Implementation......Page 209
9.5 Conclusion......Page 210
9.6 References......Page 213
10.1 Introduction......Page 216
10.2 Background......Page 217
10.3.1 System Design......Page 222
10.3.2 Performance Evaluation......Page 226
10.4 GPU Parallelization and Results......Page 228
10.4.2 Results......Page 229
10.5 Discussion......Page 233
10.6 References......Page 234
11. COPACOBANA: A Massively Parallel FPGA-Based Computer Architecture......Page 236
11.1.1 History of Complexity......Page 237
11.1.2 Basic Idea of the COPACOBANA Series......Page 238
11.2 COPACOBANA 1000......Page 239
11.2.1 FPGA Module......Page 240
11.2.2 Backplane......Page 241
11.2.3 Interface Controller......Page 242
11.2.4 Application Development......Page 243
11.3 Cryptanalysis with COPACOBANA 1000......Page 244
11.3.1 Previous Work on DES Breaking......Page 245
11.3.2 Exhaustive Key Search on DES......Page 246
11.3.3 Breaking DES-Based Crypto Tokens......Page 248
11.4.2 Requirements......Page 255
11.4.3 Architecture of COPACOBANA 5000......Page 256
11.5 Applications in Bioinformatics......Page 261
11.5.1 Sequence Alignment......Page 262
11.5.2 Motif Finding......Page 265
11.6 References......Page 272
12.1 Introduction......Page 276
12.1.2 Use of the ACA in Computational Biology......Page 277
12.1.4 Use of String Set Matching in FPGAs in Other Domains......Page 278
12.2.1 The Aho–Corasick Preprocessing Phase......Page 279
12.3 FPGA Implementation of the String Set Matching DFA......Page 282
12.3.1 Bit-Split DFA Architecture......Page 283
12.3.2 Implementing Bit-Split DFA Tables in FPGAs......Page 286
12.3.3 Analysis of DFA Storage Utilization Efficiency......Page 289
12.4.1 Storage Utilization......Page 290
12.4.2 Implementation Performance......Page 291
12.5 Conclusions......Page 293
12.6 References......Page 294
13.1 Introduction......Page 298
13.2.1 Numerical Representation of DNA Sequences......Page 300
13.2.2 The FFNN......Page 301
13.2.3 The HNN......Page 302
13.2.4 Adaptation of the HNN......Page 304
13.3 Reconfigurable DP-HNN......Page 310
13.3.2 Control and Matching Units......Page 311
13.3.3 Neuron and Memory Units......Page 313
13.3.4 Operation of DP-HNN......Page 314
13.4.1 The Biological Problem......Page 315
13.4.2 Dimeric Structure of HREs......Page 316
13.4.3 Two-Phase Neural System for HRE Prediction......Page 318
13.4.4 Performance of the Hardware-Accelerated System......Page 319
13.5 Discussions......Page 320
13.6 References......Page 322
14.1 Introduction......Page 326
14.2 The Reconfigurable Computing Paradigm......Page 328
14.3.1 Overview of the Approach......Page 330
14.3.3 Cleavage Rules......Page 331
14.3.4 Protein Identification by Spectral Matching......Page 334
14.4 Reconfigurable Computing Platform......Page 335
14.5.1 Database Encoding......Page 338
14.5.2 Database Search Processor......Page 339
14.6 Performance Evaluation......Page 344
14.7 References......Page 346
B......Page 350
C......Page 352
D......Page 355
F......Page 356
G......Page 357
H......Page 359
M......Page 360
O......Page 362
P......Page 363
R......Page 365
S......Page 366
T......Page 368
Z......Page 369
Back Page......Page 370