Handbook of Statistical Bioinformatics

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Now in its second edition, this handbook collects authoritative contributions on modern methods and tools in statistical bioinformatics with a focus on the interface between computational statistics and cutting-edge developments in computational biology. The three parts of the book cover statistical methods for single-cell analysis, network analysis, and systems biology, with contributions by leading experts addressing key topics in probabilistic and statistical modeling and the analysis of massive data sets generated by modern biotechnology. This handbook will serve as a useful reference source for students, researchers and practitioners in statistics, computer science and biological and biomedical research, who are interested in the latest developments in computational statistics as applied to computational biology.

Author(s): Henry Horng-Shing Lu, Bernhard Schölkopf, Martin T. Wells, Hongyu Zhao
Series: Springer Handbooks of Computational Statistics
Edition: 2
Publisher: Springer
Year: 2022

Language: English
Pages: 405
City: Berlin

Preface
Contents
Part I Single-Cell Analysis
Computational and Statistical Methods for Single-Cell RNA Sequencing Data
1 Introduction
2 Data Preprocessing
2.1 Reads Mapping
2.2 Cell Barcodes Demultiplexing
2.3 UMI Collapsing
2.4 Cell Barcodes Selection
2.5 Summary
3 Data Normalization and Visualization
3.1 Background
3.2 Global Scaling Normalization for UMI Data
3.3 Probabilistic Model-Based Normalization for UMI Data
3.4 Dimension Reduction and Cell Clustering
4 Dropout Imputation
4.1 Background
4.2 Cell-Cell Similarity-Based Imputation
4.3 Gene-Gene Similarity-Based Imputation
4.4 Gene-Gene and Cell-Cell Similarity-Based Imputation
4.5 Deep Neural Network-Based Imputation
4.6 G2S3
4.7 Methods Evaluation and Comparison
5 Differential Expression Analysis
5.1 Background
5.2 DE Methods Ignoring Subject Effects
5.3 DE Methods Considering Subject Effects
5.4 iDESC
5.5 DE Methods Evaluation and Comparison
5.5.1 Type I Error Comparison
5.5.2 Statistical Power Comparison
6 Concluding Remarks
References
Pre-processing, Dimension Reduction, and Clustering for Single-Cell RNA-seq Data
1 Introduction
2 Pre-processing of scRNA-seq Data
2.1 Removal of Batch Effects
2.2 Quality Control and Feature Selection
3 Dimension Reduction and Clustering
3.1 Dimension Reduction
3.2 Clustering
4 Conclusion
References
Integrative Analyses of Single-Cell Multi-Omics Data: A Review from a Statistical Perspective
1 Multi-Omics Data Profiled on Different Cells
2 Multi-Omics Data Profiled on the Same Single Cells
3 Challenges and Future Perspectives
References
Approaches to Marker Gene Identification from Single-Cell RNA-Sequencing Data
1 Introduction
2 Marker Gene Selection Relies on Identifying Differentially Expressed Genes
3 Methods for Marker Gene Selection
3.1 Highest Expressed, Highest Variable
4 Supervised Methods
4.1 COMET
4.2 scGeneFit
5 Unsupervised Methods
5.1 Seurat
5.2 SC3
5.3 SCMarker
5.4 scTIM
5.5 RankCorr
6 Discussion
References
Model-Based Clustering of Single-Cell Omics Data
1 Introduction
2 Single-Cell Transcriptomic Data Clustering
2.1 Single-Cell Transcriptomic Data Structure
2.2 DIMM-SC
2.3 Real Data Example
3 Population-Scale Single-Cell Transcriptomic Data Clustering
3.1 Population-Scale Single-Cell Transcriptomic Data Structure
3.2 BAMM-SC
3.3 Real Data Example
4 Single-Cell Multi-omics Data Clustering
4.1 CITE-seq Data Structure
4.2 BREM-SC
4.3 Real Data Example
5 Concluding Remarks
References
Deep Learning Methods for Single-Cell Omics Data
1 Introduction
2 Factor-Model-Based Deep Learning Approaches
2.1 Regularization and Priors on the Latent Factors
2.1.1 Gaussian Prior and Variational Inference
2.1.2 Adjust for Batch Effects and Confounding Covariates: Identifiability
2.1.3 Adjust for Batch Effects and Confounding Covariates: Implementation
2.1.4 Model Cell Population Structure in the Latent Space
2.2 Distributional Assumptions on Observed Data
2.2.1 Model Observed Data from scRNA-seq
2.2.2 Model Observed Data from scATAC-seq
2.2.3 Model Observed Data from Single-Cell Multiomics Technologies
2.3 Post-training Statistical Analyses
2.3.1 Denoising
2.3.2 Visualization, Clustering, and Trajectory Analysis
2.3.3 Prediction
3 Deep Learning Methods for Dimension Reduction
3.1 Construct the Loss Function
3.2 Extra Penalties and Regularization
4 Discussion
References
Part II Network Analysis
Probabilistic Graphical Models for Gene Regulatory Networks
1 Introduction
2 Probabilistic Graphical Models
2.1 Graphical Model Basics
2.2 Markov Networks
2.3 Bayesian Networks
3 Classic Graphical Models for Reconstructing GRNs
3.1 Frequentist Approach
3.2 Bayesian Approach
3.3 Graphical Models Incorporating Prior Knowledge
4 Testing in Graphical Models
4.1 Parametric Test
4.2 Non-parametric Test for Global Graph Structure
5 Conclusion
References
Additive Conditional Independence for Large and Complex Biological Structures
1 Additive Conditional Independence (ACI)
1.1 Additive Reproducing Kernel Hilbert Spaces and Relevant Linear Operators
2 Variable Selection via ACI
2.1 Nonparametric Variable Selection
2.2 Penalized Least-Square Estimation with RKHS Operators
2.3 Matrix Representation of Operators and Algorithm
2.4 Data Example
3 Graphical Modeling Through ACI
3.1 Nonparametric Graphical Models
3.2 The Additive Conditional Covariance and Partial Correlation Operators
3.3 Operator-Level Estimation and the Algorithm
3.4 Data Examples
References
Integration of Boolean and Bayesian Networks
1 Introduction
2 Methods
2.1 s-p-scores Associated with Networks, SPAN
2.2 Network Learning
3 Results
3.1 An Example
3.2 Real Example
3.3 Complex Example
4 Discussion
References
Computational Methods for Identifying MicroRNA-Gene Regulatory Modules
1 Introduction
2 Identifying MiRNA-Gene Modules by Integrating Heterogeneous Data Sources
2.1 Bipartite Graph-Based Methods
2.2 Nonnegative Matrix Factorization Methods
2.3 Statistical Modeling Approaches
3 Evaluating the Performance of MiRNA-Gene Module Identification Methods
4 Discussion
5 Conclusions
References
Causal Inference in Biostatistics
1 Introduction
1.1 Causation and Association
1.2 Two Conceptual Frameworks: Causal Effect and Causal Discovery
2 Causal Effect
2.1 Approaches to Causal Inference
2.2 Randomized Clinical Trials
2.2.1 Perfect Randomized Trials
2.2.2 Randomized Trials with Missing Data
2.2.3 Randomized Trials with Post-treatment Variables
2.3 Observational Studies
2.3.1 Unconfounded Treatment Assignment Conditional on Measured Covariates
2.3.2 Unmeasured Cofounding
3 Some Current Research Topics
3.1 Heterogenous Treatment Effect and Precision Medicine
3.2 Integrating Data from Randomized Controlled Trials and Observational Studies
3.3 Multiple Treatments
4 Software Appendix
References
Bayesian Balance Mediation Analysis in Microbiome Studies
1 Introduction
2 Bayesian Balance Mediation Model
2.1 Bayesian Balance Mediation Model with a Binary Treatment
2.2 Direct and Mediation Effect and Estimation Based on Predictive Posterior Distribution
3 MCMC Sampling
3.1 MCMC Sampling
3.2 Conditional Distributions
4 Applications to Real Data
4.1 Mediation Analysis at the Phylum Level
4.2 Analysis at the Order Level
5 Simulation Studies
5.1 Data Generation
5.2 Simulation Result
6 Discussion
References
Part III Systems Biology
Identifying Genetic Loci Associated with Complex Trait Variability
1 Introduction
2 The Concept of vQTL
3 Statistical Methods for vQTL Mapping
3.1 Classical Nonparametric Tests
3.2 Regression-Based Methods
3.3 Two-Stage Methods
3.4 Quantile Integral Linear Model (QUAIL)
3.5 Dispersion Effects
4 Applications of vQTL
4.1 Examples of vQTL
4.2 Screening vQTL for Candidate Loci Involved in GxE Interaction
4.3 Variance Polygenic Score
4.4 Other Applications and Future Directions
References
Cell Type-Specific Analysis for High-throughput Data
1 Introduction
2 Cell Type Composition Estimation
3 Cell Type-Specific Differential Analysis
4 Step-by-step Tutorial
References
Recent Development of Computational Methods in the Field of Epitranscriptomics
1 Introduction
2 MeRIP-seq and Other Technologies for RNA Modification Profiling
3 Methods to Analyze MeRIP-seq Data
3.1 Count-Based Methods for Simple Study Designs
3.2 Methods Compatible with Confounding Factors
3.3 A Guide for RNA Differential Methylation Analysis Using RADAR
4 Web Resources on m6A Epitranscriptome
4.1 Web Servers with m6A Site Prediction
4.2 m6A Epitranscriptome Database
5 Discussion
References
Estimation of Tumor Immune Signatures from Transcriptomics Data
1 Introduction
2 Regression-Based Deconvolution Algorithms
2.1 Linear Least Squares Regression
2.2 Support Vector Regression
2.3 Other Deconvolution Methods
3 Gene Set Enrichment-Based Methods and Other Gene-Based Algorithms
3.1 Gene Set Enrichment Analysis (GSEA)
3.2 Single-Sample GSEA (ssGSEA)
4 Benchmark Studies
5 Discussions
References
Cross-Linking Mass Spectrometry Data Analysis
1 Introduction
1.1 Peptide Identification Based on Mass Spectrometry
1.2 Cross-Linked Peptides Identification
1.2.1 Cross-Linker Selection
1.2.2 Chemical Reaction
1.2.3 Enzyme Digestion
1.2.4 Enrichment of Cross-Linked Peptides
1.2.5 LC-MS and MS2 Acquisition
1.2.6 Data Interpretation
1.2.7 Quality Control
1.2.8 Downstream Applications
2 Non-cleavable Cross-Linking Methods
3 Cleavable Cross-Linking Methods
4 Time-Complexity Comparison Between Non-cleavable Methods and Cleavable Methods
5 False Discovery Rate in CL-MS
5.1 Target–Decoy Approach in Linear Peptides Identification
5.2 TDA in Cross-Linked Peptides Identification
6 Downstream Applications
6.1 Protein Structure Analysis
6.2 Protein–Protein Interactions
7 Conclusion and Perspective
7.1 Wet Lab Experiments
7.2 Dry Lab Analysis
References
Cis-regulatory Element Frequency Modules and their Phase Transition across Hominidae
1 Introduction
2 Dual Eigen-Analysis of CREF Matrices
2.1 Cis-regulatory Element Frequency (CREF) Matrices
2.2 Dual Eigen-Analysis
2.2.1 Low-Rank Matrix Approximation
2.2.2 Robust SVD
2.2.3 Dual CREF Eigen-Modules
2.2.4 Cross-Species Correlations and Conservation of Motif Eigen-Modules
2.2.5 Enrichment Analysis of Gene Eigen-Modules
2.3 Stability of CREF Modules
2.3.1 Sensitivity of Eigenvalues and Eigenvectors
2.3.2 Degenerate 2-D Eigen-Space and Its Stability
3 Dual Eigen-Analysis Unravels the Regulatory Evolution from Apes to Human
3.1 Evolutionary Conservation and Divergence of CREF Eigen-Modules
3.2 Conservation of Top Three CREF Modules
3.3 Dual Eigen-Analysis Identifies the Regulators Involved in Major Biological Processes
3.4 Phase Transition of CREF Eigen-Modules
3.5 Human-Specific Phenotypes Corresponding to Its Unique Fourth Gene Eigenvector
3.6 Motifs Present on Alu Elements Underlying the CREF Module Reorganization
3.7 Cis-trans Regulation of the Human-Specific Gene Eigen-Module
4 Discussion
4.1 Definition of cis-regulatory Profiles
4.2 Dual Eigen-Analysis and the Classical PCA
4.3 The Phase Transition in cis-regulatory Evolution
4.4 Conclusion
References
Improved Method for Rooting and Tip-Dating a Viral Phylogeny
1 Introduction
1.1 The Need for Answering the “When” Question Without a Good Outgroup
1.2 Conventional TRAD Methods Assume a Constant and Need to Be Extended to Allow to Change over Time
2 TRAD: Tip Rooting and Ancestor Dating
2.1 Estimating the Root
2.2 Dating the Common Ancestor (TA) and Estimating the Evolutionary Rate
3 Increasing Evolutionary Rate During SARS-CoV-2 Evolution over Time
3.1 Results from the Viral Phylogeny Released by NCBI on April 3, 2021
3.2 Results from the Viral Phylogeny Released by NCBI on September 4, 2021
4 The Difficulty with the “Where” Question
5 Conclusions
References