Next-Generation Sequencing Data Analysis

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Next-generation DNA and RNA sequencing has revolutionized biology and medicine. With sequencing costs continuously dropping and our ability to generate large datasets rising, data analysis becomes more important than ever. Next-Generation Sequencing Data Analysis walks readers through next-generation sequencing (NGS) data analysis step by step for a wide range of NGS applications.

Author(s): Xinkun Wang
Edition: 2
Publisher: CRC Press
Year: 2023

Language: English
Pages: 434
City: Boca Raton

Cover
Half Title
Title Page
Copyright Page
Table of Contents
Preface to the Second Edition
Author
Part I Introduction to Cellular and Molecular Biology
1 The Cellular System and the Code of Life
1.1 The Cellular Challenge
1.2 How Cells Meet the Challenge
1.3 Molecules in Cells
1.4 Intracellular Structures Or Spaces
1.4.1 Nucleus
1.4.2 Cell Membrane
1.4.3 Cytoplasm
1.4.4 Endosome, Lysosome, and Peroxisome
1.4.5 Ribosome
1.4.6 Endoplasmic Reticulum
1.4.7 Golgi Apparatus
1.4.8 Cytoskeleton
1.4.9 Mitochondrion
1.4.10 Chloroplast
1.5 The Cell as a System
1.5.1 The Cellular System
1.5.2 Systems Biology of the Cell
1.5.3 How to Study the Cellular System
References
2 DNA Sequence: The Genome Base
2.1 The DNA Double Helix and Base Sequence
2.2 How DNA Molecules Replicate and Maintain Fidelity
2.3 How the Genetic Information Stored in DNA Is Transferred to Protein
2.4 The Genomic Landscape
2.4.1 The Minimal Genome
2.4.2 Genome Sizes
2.4.3 Protein-Coding Regions of the Genome
2.4.4 Non-Coding Genomic Elements
2.5 DNA Packaging, Sequence Access, and DNA-Protein Interactions
2.5.1 DNA Packaging
2.5.2 Sequence Access
2.5.3 DNA-Protein Interactions
2.6 DNA Sequence Mutation and Polymorphism
2.7 Genome Evolution
2.8 Epigenome and DNA Methylation
2.9 Genome Sequencing and Disease Risk
2.9.1 Mendelian (Single-Gene) Diseases
2.9.2 Complex Diseases That Involve Multiple Genes
2.9.3 Diseases Caused By Genome Instability
2.9.4 Epigenomic/Epigenetic Diseases
References
3 RNA: The Transcribed Sequence
3.1 RNA as the Messenger
3.2 The Molecular Structure of RNA
3.3 Generation, Processing, and Turnover of RNA as a Messenger
3.3.1 DNA Template
3.3.2 Transcription of Prokaryotic Genes
3.3.3 Pre-MRNA Transcription of Eukaryotic Genes
3.3.4 Maturation of MRNA
3.3.5 Transport and Localization
3.3.6 Stability and Decay
3.3.7 Major Steps of MRNA Transcript Level Regulation
3.4 RNA Is More Than a Messenger
3.4.1 Ribozyme
3.4.2 SnRNA and SnoRNA
3.4.3 RNA for Telomere Replication
3.4.4 RNAi and Small Non-Coding RNAs
3.4.4.1 MiRNA
3.4.4.2 SiRNA
3.4.4.3 PiRNA
3.4.5 Long Non-Coding RNAs
3.4.6 Other Non-Coding RNAs
3.5 The Cellular Transcriptional Landscape
References
Part II Introduction to Next-Generation Sequencing (NGS) and NGS Data Analysis
4 Next-Generation Sequencing (NGS) Technologies: Ins and Outs
4.1 How to Sequence DNA: From First Generation to the Next
4.2 Ins and Outs of Different NGS Platforms
4.2.1 Illumina Reversible Terminator Short-Read Sequencing
4.2.1.1 Sequencing Principle
4.2.1.2 Implementation
4.2.1.3 Error Rate, Read Length, Data Output, and Cost
4.2.1.4 Sequence Data Generation
4.2.2 Pacific Biosciences Single-Molecule Real-Time (SMRT) Long-Read Sequencing
4.2.2.1 Sequencing Principle
4.2.2.2 Implementation
4.2.2.3 Error Rate, Read Length, Data Output, and Cost
4.2.2.4 Sequence Data Generation
4.2.3 Oxford Nanopore Technologies (ONT) Long-Read Sequencing
4.2.3.1 Sequencing Principle
4.2.3.2 Implementation
4.2.3.3 Error Rate, Read Length, Data Output, and Cost
4.2.3.4 Sequence Data Generation
4.2.4 Ion Torrent Semiconductor Sequencing
4.2.4.1 Sequencing Principle
4.2.4.2 Implementation
4.2.4.3 Error Rate, Read Length, Date Output, and Cost
4.2.4.4 Sequence Data Generation
4.3 A Typical NGS Workflow
4.4 Biases and Other Adverse Factors That May Affect NGS Data Accuracy
4.4.1 Biases in Library Construction
4.4.2 Biases and Other Factors in Sequencing
4.5 Major Applications of NGS
4.5.1 Transcriptomic Profiling (Bulk and Single-Cell RNA-Seq)
4.5.2 Genetic Mutation and Variation Identification
4.5.3 De Novo Genome Assembly
4.5.4 Protein-DNA Interaction Analysis (ChIP-Seq)
4.5.5 Epigenomics and DNA Methylation Study (Methyl-Seq)
4.5.6 Metagenomics
References
5 Early-Stage Next-Generation Sequencing (NGS) Data Analysis: Common Steps
5.1 Basecalling, FASTQ File Format, and Base Quality Score
5.2 NGS Data Quality Control and Preprocessing
5.3 Read Mapping
5.3.1 Mapping Approaches and Algorithms
5.3.2 Selection of Mapping Algorithms and Reference Genome Sequences
5.3.3 SAM/BAM as the Standard Mapping File Format
5.3.4 Mapping File Examination and Operation
5.4 Tertiary Analysis
References
6 Computing Needs for Next-Generation Sequencing (NGS) Data Management and Analysis
6.1 NGS Data Storage, Transfer, and Sharing
6.2 Computing Power Required for NGS Data Analysis
6.3 Cloud Computing
6.4 Software Needs for NGS Data Analysis
6.4.1 Parallel Computing
6.5 Bioinformatics Skills Required for NGS Data Analysis
References
Part III Application-Specific NGS Data Analysis
7 Transcriptomics By Bulk RNA-Seq
7.1 Principle of RNA-Seq
7.2 Experimental Design
7.2.1 Factorial Design
7.2.2 Replication and Randomization
7.2.3 Sample Preparation and Sequencing Library Preparation
7.2.4 Sequencing Strategy
7.3 RNA-Seq Data Analysis
7.3.1 Read Mapping
7.3.2 Quantification of Reads
7.3.3 Normalization
7.3.4 Batch Effect Removal
7.3.5 Identification of Differentially Expressed Genes
7.3.6 Multiple Testing Correction
7.3.7 Gene Clustering
7.3.8 Functional Analysis of Identified Genes
7.3.9 Differential Splicing Analysis
7.4 Visualization of RNA-Seq Data
7.5 RNA-Seq as a Discovery Tool
References
8 Transcriptomics By Single-Cell RNA-Seq
8.1 Experimental Design
8.1.1 Single-Cell RNA-Seq General Approaches
8.1.2 Cell Number and Sequencing Depth
8.1.3 Batch Effects Minimization and Sample Replication
8.2 Single-Cell Preparation, Library Construction, and Sequencing
8.2.1 Single-Cell Preparation
8.2.2 Single Nuclei Preparation
8.2.3 Library Construction and Sequencing
8.3 Preprocessing of ScRNA-Seq Data
8.3.1 Initial Data Preprocessing and Quality Control
8.3.2 Alignment and Transcript Counting
8.3.3 Data Cleanup Post Alignment
8.3.4 Normalization
8.3.5 Batch Effects Correction
8.3.6 Signal Imputation
8.4 Feature Selection, Dimension Reduction, and Visualization
8.4.1 Feature Selection
8.4.2 Dimension Reduction
8.4.3 Visualization
8.5 Cell Clustering, Cell Identity Annotation, and Compositional Analysis
8.5.1 Cell Clustering
8.5.2 Cell Identity Annotation
8.5.3 Compositional Analysis
8.6 Differential Expression Analysis
8.7 Trajectory Inference
8.8 Advanced Analyses
8.8.1 SNV/CNV Detection and Allele-Specific Expression Analysis
8.8.2 Alternative Splicing Analysis
8.8.3 Gene Regulatory Network Inference
References
9 Small RNA Sequencing
9.1 Small RNA NGS Data Generation and Upstream Processing
9.1.1 Data Generation
9.1.2 Preprocessing
9.1.3 Mapping
9.1.4 Identification of Known and Putative Small RNA Species
9.1.5 Normalization
9.2 Identification of Differentially Expressed Small RNAs
9.3 Functional Analysis of Identified Known Small RNAs
References
10 Genotyping and Variation Discovery By Whole Genome/Exome Sequencing
10.1 Data Preprocessing, Mapping, Realignment, and Recalibration
10.2 Single Nucleotide Variant (SNV) and Short Indel Calling
10.2.1 Germline SNV and Indel Calling
10.2.2 Somatic Mutation Detection
10.2.3 Variant Calling From RNA Sequencing Data
10.2.4 Variant Call Format (VCF)
10.2.5 Evaluating VCF Results
10.3 Structural Variant (SV) Calling
10.3.1 Short-Read-Based SV Calling
10.3.2 Long-Read-Based SV Calling
10.3.3 CNV Detection
10.3.4 Integrated SV Analysis
10.4 Annotation of Called Variants
References
11 Clinical Sequencing and Detection of Actionable Variants
11.1 Clinical Sequencing Data Generation
11.1.1 Patient Sample Collection
11.1.2 Library Preparation and Sequencing Approaches
11.2 Read Mapping and Variant Calling
11.3 Variant Filtering
11.3.1 Frequency of Occurrence
11.3.2 Functional Consequence
11.3.3 Existing Evidence of Relationship to Human Disease
11.3.4 Clinical Phenotype Match
11.3.5 Mode of Inheritance
11.4 Variant Ranking and Prioritization
11.5 Classification of Variants Based On Pathogenicity
11.5.1 Classification of Germline Variants
11.5.2 Classification of Somatic Variants
11.6 Clinical Review and Reporting
11.6.1 Use of Artificial Intelligence in Variant Reporting
11.6.2 Expert Review
11.6.3 Generation of Testing Report
11.6.4 Variant Validation
11.6.5 Incorporation Into a Patient’s Electronic Health Record
11.6.6 Reporting of Secondary Findings
11.6.7 Patient Counseling and Periodic Report Updates
11.7 Bioinformatics Pipeline Validation
References
12 De Novo Genome Assembly With Long And/or Short Reads
12.1 Genomic Factors and Sequencing Strategies for De Novo Assembly
12.1.1 Genomic Factors That Affect De Novo Assembly
12.1.2 Sequencing Strategies for De Novo Assembly
12.2 Assembly of Contigs
12.2.1 Sequence Data Preprocessing, Error Correction, and Assessment of Genome Characteristics
12.2.2 Contig Assembly Algorithms
12.2.3 Polishing
12.3 Scaffolding and Gap Closure
12.4 Assembly Quality Evaluation
12.5 Limitations and Future Development
References
13 Mapping Protein-DNA Interactions With ChIP-Seq
13.1 Principle of ChIP-Seq
13.2 Experimental Design
13.2.1 Experimental Control
13.2.2 Library Preparation
13.2.3 Sequencing Length and Depth
13.2.4 Replication
13.3 Read Mapping, Normalization, and Peak Calling
13.3.1 Data Quality Control and Read Mapping
13.3.2 Peak Calling
13.3.3 Post-Peak Calling Quality Control
13.3.4 Peak Visualization
13.4 Differential Binding Analysis
13.5 Functional Analysis
13.6 Motif Analysis
13.7 Integrated ChIP-Seq Data Analysis
References
14 Epigenomics By DNA Methylation Sequencing
14.1 DNA Methylation Sequencing Strategies
14.1.1 Bisulfite Conversion Methyl-Seq
14.1.1.1 Whole-Genome Bisulfite Sequencing (WGBS)
14.1.1.2 Reduced Representation Bisulfite Sequencing (RRBS)
14.1.2 Enzymatic Conversion Methyl-Seq
14.1.3 Enrichment-Based Methyl-Seq
14.1.4 Differentiation of Cytosine Methylation From Demethylation Products
14.2 DNA Methylation Sequencing Data Analysis
14.2.1 Quality Control and Preprocessing
14.2.2 Read Mapping
14.2.3 Quantification of DNA Methylation/Demethylation Products
14.2.4 Visualization
14.3 Detection of Differentially Methylated Cytosines and Regions
14.4 Data Verification, Validation, and Interpretation
References
15 Whole Metagenome Sequencing for Microbial Community Analysis
15.1 Experimental Design and Sample Preparation
15.1.1 Metagenome Sample Collection
15.1.2 Metagenome Sample Processing
15.2 Sequencing Approaches
15.3 Overview of Shotgun Metagenome Sequencing Data Analysis
15.4 Sequencing Data Quality Control and Preprocessing
15.5 Taxonomic Characterization of a Microbial Community
15.5.1 Metagenome Assembly
15.5.2 Sequence Binning
15.5.3 Calling of Genes and Other Genomic Elements From Metagenomic Sequences
15.5.4 Taxonomic Profiling
15.6 Functional Characterization of a Microbial Community
15.6.1 Gene Function Annotation
15.6.2 Gene Function Profiling and Metabolic Pathway Reconstruction
15.7 Comparative Metagenomic Analysis
15.7.1 Metagenome Sequencing Data Normalization
15.7.2 Identification of Differentially Abundant Species Or OTUs
15.8 Integrated Metagenomics Data Analysis Pipelines
15.9 Metagenomics Data Repositories
References
Part IV The Changing Landscape of NGS Technologies and Data Analysis
16 What’s Next for Next-Generation Sequencing (NGS)?
16.1 The Changing Landscape of Next-Generation Sequencing (NGS)
16.2 Newer Sequencing Technologies
16.3 Continued Evolution and Growth of Bioinformatics Tools for NGS Data Analysis
16.4 Efficient Management of NGS Analytic Workflows
16.5 Deepening Applications of NGS to Single-Cell and Spatial Sequencing
16.6 Increasing Use of Machine Learning in NGS Data Analytics
References
Appendix I Common File Types Used in NGS Data Analysis
Appendix II Glossary
Index