High-Performance Algorithms for Mass Spectrometry-Based Omics

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

To date, processing of high-throughput Mass Spectrometry (MS) data is accomplished using serial algorithms. Developing new methods to process MS data is an active area of research but there is no single strategy that focuses on scalability of MS based methods.

 

Mass spectrometry is a diverse and versatile technology for high-throughput functional characterization of proteins, small molecules and metabolites in complex biological mixtures. In the recent years the technology has rapidly evolved and is now capable of generating increasingly large (multiple tera-bytes per experiment) and complex (multiple species/microbiome/high-dimensional) data sets. This rapid advance in MS instrumentation  must  be matched by equally fast and rapid evolution of scalable methods developed for analysis of these complex data sets. Ideally, the new methods should leverage the rich heterogeneous computational resources available in a ubiquitous fashion in the form of  multicore,  manycore,  CPU-GPU, CPU-FPGA, and IntelPhi architectures.

 

The absence of these high-performance computing algorithms now hinders scientific advancements for mass spectrometry research. In this book we illustrate the need for high-performance computing algorithms for MS based proteomics, and proteogenomics and showcase our progress in developing these high-performance algorithms.

Author(s): Fahad Saeed, Muhammad Haseeb
Series: Computational Biology, 34
Publisher: Springer
Year: 2022

Language: English
Pages: 145
City: Cham

Preface
References
Acknowledgements
Contents
1 Need for High-Performance Computing for MS-Based Omics Data Analysis
References
2 Introduction to Mass Spectrometry Data
2.1 Proteomics
2.1.1 Mass Spectrometry-Based Proteomics
2.1.2 MS/MS Data Pre-processing
2.1.3 Peptide Identification
2.2 Proteogenomics
References
3 Existing HPC Methods and theCommunication Lower Bounds for Distributed-Memory Computations for Mass Spectrometry-Based Omics Data
3.1 Introduction
3.2 Communication Model
3.2.1 Sequential Computer
3.2.2 Parallel Computer
3.3 MS Database Proteomics, Proteogenomics, and Meta-Proteomics Search
3.3.1 Generalized Parallel Computing Strategy
3.4 Communication Lower Bounds
3.5 Meta-Analysis of Results of Current HPC Methods
3.6 Discussions
3.7 Conclusions
References
4 High-Performance Computing Strategy Using Distributed-Memory Supercomputers
4.1 Introduction
4.1.1 Background
4.1.2 Problem Statement
4.2 The HiCOPS Framework
4.2.1 Database Indexing
4.2.2 Experimental Data Pre-processing
4.2.3 Parallel Database Peptide Search
4.2.4 Assembling the Local Results
4.3 Optimizations
4.3.1 Task Scheduling
4.3.2 Communication Optimization
4.4 Results
4.4.1 Experimental Settings
4.4.2 Correctness Analysis
4.4.3 Speed Comparison
4.4.4 Performance Evaluation
4.5 Discussion
References
5 Fast Spectral Pre-processing for Big MS Data
5.1 A Review of Spectral Pre-processing Methods
5.1.1 Spectral Denoising Algorithms
5.1.2 Spectral Quality Assessment Algorithms
5.1.3 Separation of b-y Ions
5.2 MS-REDUCE: An Ultra-Fast Data Reduction Algorithm for Big MS Data
5.2.1 Spectral Classification
5.2.2 Spectral Quantization
5.2.3 Weighted Random Sampling
5.3 Performance Evaluation of MS-REDUCE
5.3.1 Time Complexity
5.3.2 Experimental Verification of the Complexity Analysis
5.3.3 Speed Comparison
5.3.4 Comparing MS-REDUCE with Other Denoising Methods
5.3.5 Quality Assessment
5.3.6 Comparison with Random Sampling of Peaks
5.3.7 Comparison with Conventional Algorithms
References
6 A Easy to Use Generalized Template to Support Development of GPU Algorithms
6.1 GPU Architecture and CUDA
6.1.1 CUDA Overview
6.1.2 CPU-GPU Computing
6.2 Challenges in GPU Algorithm Design
6.2.1 Need for Data Parallel Design
6.2.2 Data Transfer Bottlenecks
6.2.3 Non-coalesced Memory Accesses
6.2.4 Warp Divergence
6.2.5 Exploiting Coarse Grained and Fine Grained Parallelism
6.3 Basic Principles of GPU-DAEMON
6.3.1 Simplifying Complex Data Structures
6.3.2 Simplifying Complex Computations
6.3.3 Efficient Array Management in GPU
6.3.4 Exploiting Shared Memory
6.3.5 In-Warp Optimizations
6.3.6 Result Sifting
6.3.7 Post Processing Results
6.3.8 Time Complexity Model for GPU-DAEMON
References
7 Computational CPU-GPU Template for Pre-processing of Floating-Point MS Data
7.1 Simplifying Complex Data Structures
7.2 Efficient Array Management
7.2.1 Splitter Selection
7.2.2 Bucketing
7.3 In-Wrap Optimizations and Exploiting Shared Memory
7.4 Time Complexity Model
7.5 Performance Evaluation
7.5.1 Sorting Using Tagged Approach (STA)
7.5.2 Runtime Analysis and Comparisons
7.5.3 Data Handling Efficiency
References
8 G-MSR: A GPU-Based Dimensionality Reduction Algorithm
8.1 G-MSR Algorithm
8.1.1 Simplifying Complex Data Structures
8.1.2 Simplifying Complex Computations
8.1.3 Efficient Array Management
8.1.4 Exploiting Shared Memory
8.1.5 In-Warp Optimizations
8.1.6 Result Sifting
8.1.7 Post Processing Results
8.2 Results and Experiments
8.2.1 Time Complexity Model
8.2.2 Experiment Setup
8.2.3 Scalability and Time Analysis
8.2.4 Quality Assessment
8.2.5 Reductive Proteomics for high-resolution instruments
8.2.6 Comparison with Unified Memory
References
9 Re-configurable Hardware for Computational Proteomics
9.1 Introduction
9.1.1 Construction of a Field-Programmable Gate Array
9.2 Popular Architectural Configurations Using FPGAs
9.2.1 Systolic Array Configuration
9.2.2 Parallel Asynchronous PEs Connected to the System Bus
9.2.3 Parallel Processors with Communication Interconnect
9.3 FPGA Design for Computational Proteomics
9.3.1 Architecture Overview
9.3.2 Processing Element (PE)
9.3.3 Bus-Arbitration Module
9.3.4 Binary Search Module
9.3.5 Ion-Matching Circuit
9.3.6 Experiments and Results
9.4 Conclusion
10 Machine-Learning and the Future of HPC for MS-Based Omics
10.1 Why HPC is Essential for Machine-Learning Models
10.2 Preliminary Data and Findings
References
Appendix Glossary
Index