Bioinformatics Methods: From Omics to Next Generation Sequencing

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

The past three decades have witnessed an explosion of what is now referred to as high-dimensional `omics' data. Bioinformatics Methods: From Omics to Next Generation Sequencing describes the statistical methods and analytic frameworks that are best equipped to interpret these complex data and how they apply to health-related research. Covering the technologies that generate data, subtleties of various data types, and statistical underpinnings of methods, this book identifies a suite of potential analytic tools, and highlights commonalities among statistical methods that have been developed. An ideal reference for biostatisticians and data analysts that work in collaboration with scientists and clinical investigators looking to ensure rigorous application of available methodologies. Key Features Survey of a variety of omics data types and their unique features Summary of statistical underpinnings for widely used omics data analysis methods Description of software resources for performing omics data analyses

Author(s): Shili Lin, Denise Scholtens, Sujay Datta
Series: Computational Biology
Publisher: CRC Press/Chapman & Hall
Year: 2022

Language: English
Pages: 350
City: Boca Raton

Cover
Half Title
Title Page
Copyright Page
Dedication
Contents
Preface
1. The Biology of a Living Organism
1.1. Cells
1.2. Genes, DNA and RNA
1.3. Proteins
1.4. The epigenome
1.5. Metabolism
1.6. Biological regulation and cancer
1.7. Data generating technologies
2. Protein-Protein Interactions
2.1. Data sets
2.2. Technologies and data types
2.3. Graph representations of protein-protein interaction data
2.4. Sampling issues in protein-protein interaction data
2.5. Systematic and stochastic measurement errors
3. Protein-Protein Interaction Network Analyses
3.1. Node summaries in protein interaction graphs
3.1.1. Node degree
3.1.2. Clustering coefficient
3.1.3. Connectivity
3.1.4. Betweenness
3.1.5. Applications to protein-protein interaction networks
3.2. Graph models of protein interaction data
3.2.1. Erdos-Renyi random graphs
3.2.2. Scale-free graphs
3.2.3. Hierarchical graphs
3.2.4. Modularity
3.3. Module detection
3.3.1. Community detection algorithms
3.3.2. Protein complex estimation
3.4. Software
3.5. Integration of protein interactions with other data types
4. Detection of Imprinting and Maternal Effects
4.1. Imprinting and maternal genotype effects – Two epigenetic factors
4.1.1. Imprinting effects on complex diseases
4.1.2. Maternal genotype effects on complex diseases
4.2. Confounding between imprinting and maternal effects
4.3. Evolving study designs
4.4. Methods for detecting imprinting and maternal effects using data from prospective studies
4.5. Methods for detecting imprinting and maternal effects using data from retrospective studies
4.5.1. Joint detection of imprinting and maternal genotype effects
4.5.2. Detection of imprinting assuming no maternal effect
4.5.3. Detection of maternal-child genotype interacting effect assuming no imprinting
4.6. Case studies
4.6.1. Case study 1 – Framingham Heart Study
4.6.2. Case study 2 – UK rheumatoid arthritis data
4.7. Software
4.8. Concluding remarks
5. Modeling and Analysis of Next-Generation Sequencing Data
5.1. Isolation, quality control and library preparation
5.2. Validation, pooling and normalization
5.3. Sequencing
5.3.1. Single-end vs. paired-end
5.3.2. Generations of sequencing technology
5.3.3. Various next-generation sequencing platforms
5.3.3.1. Illumina
5.3.3.2. SOLiD
5.3.3.3. Ion Torrent semiconductor sequencing
5.3.3.4. Pacific biosciences single molecule real-time sequencing
5.3.3.5. Nanopore technologies
5.3.3.6. Choosing a platform
5.4. Factors affecting NGS data accuracy
5.4.1. At the library preparation stage
5.4.2. At the sequencing stage
5.5. Applications of RNA-Seq
5.6. RNA-Seq data preprocessing and analysis
5.6.1. Base calling
5.6.2. Quality control and preprocessing of reads
5.6.2.1. Quality control
5.6.2.2. Preprocessing
5.6.3. Read alignment
5.6.4. Genome-guided transcriptome assembly and isoform finding
5.6.5. Quantification and comparison of expression levels
5.6.6. Normalization methods
5.6.7. Differential expression analysis
5.6.7.1. Binomial and Poisson-based approaches
5.6.7.2. Empirical Bayes approaches
5.6.7.3. Negative binomial-based approaches
5.6.8. Classification
5.6.8.1. Linear discriminant analysis
5.6.8.2. Support vector machine classifier
5.6.9. Further downstream analysis
6. Sequencing-Based DNA Methylation Data
6.1. DNA methylation
6.2. Evolving technologies for measuring DNA methylation
6.3. Methods for Detection of DMCs using BS-seq data
6.3.1. BS-seq data
6.3.2. Fisher's exact test
6.3.3. Logistic regression
6.3.4. Beta-binomial formulations
6.3.4.1. Parameter estimation.
6.3.4.2. Statistical inference – hypothesis testing
6.3.5. Smoothing
6.3.5.1. Smoothing as part of the beta-binomial procedures
6.3.5.2. BSmooth
6.4. Methods for detection of DMRs using BS-seq data
6.4.1. Rule-based procedure – follow-up on DMCs
6.4.2. Credible band procedure – a single step approach
6.4.3. Summary of methods for BS-seq data ‒ Which methods to choose?
6.5. Methods for detection of DMRs using Cap-seq data
6.5.1. Cap-seq data
6.5.2. Direct methods
6.5.2.1. Quantification of methylation signals
6.5.2.2. Detection of DMRs
6.5.3. Two-step methods
6.5.3.1. Derivation of nucleotide-level data
6.5.3.2. Detection of DMRs
6.6. Case studies
6.6.1. Case study 1 ‒ Detection of DMRs using BS-seq data
6.6.2. Case study 2 ‒ Detection of DMRs using Cap-seq data
6.7. Software
6.8. Concluding remarks and statistical challenges
7. Modeling and Analysis of Spatial Chromatin Interactions
7.1. 3D chromosome organization and spatial regulation
7.2. Evolving technologies for measuring long-range interaction
7.3. Methods for recapitulating 3D structures using Hi-C type data
7.3.1. Hi-C data
7.3.2. Non-parametric optimization-based methods
7.3.3. Model-based methods
7.4. Methods for detecting long-range interactions using ChIA-PET type data
7.4.1. ChIA-PET data
7.4.2. Detections of true chromatin interactions ‒ individual pair analysis
7.4.3. Detections of true chromatin interactions ‒ joint analysis of all pairs
7.4.4. Detections of pairs with differential chromatin interactions
7.5. Case studies
7.5.1. Case study 1 ‒ Reconstruction of 3D structure from Hi-C data
7.5.2. Case study 2 ‒ Detection of true chromatin interactions from ChIA-PET data
7.5.3. Case study 3 ‒ Detection of differential chromatin interaction intensities from ChIA-PET data
7.6. Software
7.7. Concluding remarks, and statistical and computational challenge
8. Digital Improvement of Single Cell Hi-C Data
8.1. Sparsity of single cell Hi-C data
8.1.1. Digital improvement of data quality
8.1.2. Separating structural zeros from dropouts
8.2. Adaptation of scRNA imputation methods for improving scHi-C data quality
8.2.1. Preparation of scHi-C data for using scRNA methods
8.2.2. Machine-learning algorithms for imputation
8.2.3. A block algorithm to account for spatial correlations
8.3. Methods designed for improving Hi-C data quality
8.4. Self representation smoothing for imputation and structural zeros inference
8.4.1. SRS model and estimation
8.4.2. Inference for structural zeros
8.5. Bayesian modeling for identifying structural zeros and imputing dropouts
8.5.1. The HiCImpute model
8.5.2. Statistical inference
8.6. Case studies
8.6.1. Case study 1 ‒ improved cell clustering with enhanced scHi-C data from scRNA imputation method
8.6.2. Case study 2 ‒ cell clustering with enhanced data from methods proposed specifically for scHi-C analysis
8.6.3. Case study 3 ‒ discovery of subtypes based on improved data
8.7. Software
8.8. Concluding remarks, and statisitcal and computational challenges
9. Metabolomics Data Preprocessing
9.1. Introduction
9.2. Data sets
9.2.1. HAPO metabolomics study
9.3. Technology platforms
9.4. Data formats
9.5. Peak identification and metabolite quantification for non-targeted data
9.5.1. Peak detection
9.5.2. Peak alignment
9.5.3. Peak deconvolution
9.5.4. Metabolite identification
9.6. Normalization
9.7. Experimental design
10. Metabolomics Data Analysis
10.1. Per-metabolite analyses
10.1.1. Redundant metabolite measurements
10.1.2. Descriptive statistics
10.1.3. Individual tests of metabolite association
10.1.4. Missing data
10.1.5. Multiple comparisons adjustment
10.2. Multivariate approaches
10.3. Pathway enrichment analyses
10.4. Network analyses
10.4.1. Constructing and describing networks
10.4.2. Metabolic subnetwork optimization
10.4.3. Local community detection
10.4.4. Differential metabolic networks
10.5. Case studies on data integration
10.5.1. Metabolomics and genetic variants
10.5.2. Metabolomics and transcriptomics
11. Appendix
11.1. Basics of probability
11.2. Random variables and probability distributions
11.3. Basics of stochastic processes
11.4. Hidden Markov models
11.5. Frequentist statistical inference
11.6. Bayesian inference
Bibliography
Index