This book contains the latest material in the subject, covering next generation sequencing (NGS) applications and meeting the requirements of a complete semester course. This book digs deep into analysis, providing both concept and practice to satisfy the exact need of researchers seeking to understand and use NGS data reprocessing, genome assembly, variant discovery, gene profiling, epigenetics, and metagenomics. The book does not introduce the analysis pipelines in a black box, but with detailed analysis steps to provide readers with the scientific and technical backgrounds required to enable them to conduct analysis with confidence and understanding. The book is primarily designed as a companion for researchers and graduate students using sequencing data analysis but will also serve as a textbook for teachers and students in biology and bioscience.
Author(s): Hamid D. Ismail
Series: Chapman & Hall/CRC Computational Biology Series
Publisher: CRC Press/Chapman & Hall
Year: 2023
Language: English
Pages: 348
City: Boca Raton
Cover
Half Title
Series Page
Title Page
Copyright Page
Table of Contents
Preface
Author
Chapter 1 ◾ Sequencing and Raw Sequence Data Quality Control
1.1 Nucleic Acids
1.2 Sequencing
1.2.1 First-Generation Sequencing
1.2.2 Next-Generation Sequencing
1.2.2.1 Roche 454 Technology
1.2.2.2 Ion Torrent Technology
1.2.2.3 AB SOLiD Technology
1.2.2.4 Illumina Technology
1.2.3 Third-Generation Sequencing
1.2.3.1 PacBio Technology
1.2.3.2 Oxford Nanopore Technology
1.3 Sequencing Depth and Read Quality
1.3.1 Sequencing Depth
1.3.2 Base Call Quality
1.4 Fastq Files
1.5 Fastq Read Quality Assessment
1.5.1 Basic Statistics
1.5.2 Per Base Sequence Quality
1.5.3 Per Tile Sequence Quality
1.5.4 Per Sequence Quality Scores
1.5.5 Per Base Sequence Content
1.5.6 Per Sequence GC Content
1.5.7 Per Base N Content
1.5.8 Sequence Length Distribution
1.5.9 Sequence Duplication Levels
1.5.10 Overrepresented Sequences
1.5.11 Adapter Content
1.5.12 K-Mer Content
1.6 Preprocessing of the Fastq Reads
1.7 Summary
References
Chapter 2 ◾ Mapping of Sequence Reads to the Reference Genomes
2.1 Introduction to Sequence Mapping
2.2 Read Mapping
2.2.1 Trie
2.2.2 Suffix Tree
2.2.3 Suffix Arrays
2.2.4 Burrows–Wheeler Transform
2.2.5 FM-Index
2.3 Read Sequence Alignment and Aligners
2.3.1 SAM and BAM File Formats
2.3.2 Read Aligners
2.3.2.1 Burrows–Wheeler Aligner
2.3.2.2 Bowtie2
2.3.2.3 Star
2.4 Manipulating Alignments in SAM/BAM Files
2.4.1 Samtools
2.4.1.1 SAM/BAM Format Conversion
2.4.1.2 Sorting Alignment
2.4.1.3 Indexing BAM File
2.4.1.4 Extracting Alignments of a Chromosome
2.4.1.5 Filtering and Counting Alignment in SAM/BAM Files
2.4.1.6 Removing Duplicate Reads
2.4.1.7 Descriptive Statistics
2.5 Reference-Guided Genome Assembly
2.6 Summary
References
Chapter 3 ◾ De Novo Genome Assembly
3.1 Introduction to De Novo Genome Assembly
3.1.1 Greedy Algorithm
3.1.2 Overlap-Consensus Graphs
3.1.3 De Bruijn Graphs
3.2 Examples of De Novo Assemblers
3.2.1 ABySS
3.2.2 SPAdes
3.3 Genome Assembly Quality Assessment
3.3.1 Statistical Assessment for Genome Assembly
3.3.2 Evolutionary Assessment for De Novo Genome Assembly
3.4 Summary
References
Chapter 4 ◾ Variant Discovery
4.1 Introduction to Genetic Variations
4.1.1 VCF File Format
4.1.2. Variant Calling and Analysis
4.2 Variant Calling Programs
4.2.1 Consensus-Based Variant Callers
4.2.1.1 BCF Tools Variant Calling Pipeline
4.2.2 Haplotype-Based Variant Callers
4.2.2.1 FreeBayes Variant Calling Pipeline
4.2.2.2 GATK Variant Calling Pipeline
4.3 Visualizing Variants
4.4 Variant Annotation and Prioritization
4.4.1 SIFT
4.4.2 SnpEff
4.3.3 ANNOVAR
4.3.3.1 Annotation Databases
4.3.3.2 ANNOVAR Input Files
4.5 Summary
References
Chapter 5 ◾ RNA-Seq Data Analysis
5.1 Introduction to RNA-Seq
5.2 RNA-Seq Applications
5.3 RNA-Seq Data Analysis Workflow
5.3.1 Acquiring RNA-Seq Data
5.3.2 Read Mapping
5.3.3 Alignment Quality Assessment
5.3.4 Quantification
5.3.5 Normalization
5.3.5.1 RPKM and FPKM
5.3.5.2 Transcripts Per Million
5.3.5.3 Counts Per Million Mapped Reads
5.3.5.4 Trimmed Mean of M-Values
5.3.5.5 Relative Expression
5.3.5.6 Upper Quartile
5.3.6 Differential Expression Analysis
5.3.7 Using EdgeR for Differential Analysis
5.3.7.1 Data Preparation
5.3.7.2 Annotation
5.3.7.3 Design Matrix
5.3.7.4 Filtering Low-Expressed Genes
5.3.7.5 Normalization
5.3.7.6 Estimating Dispersions
5.3.7.7 Exploring the Data
5.3.7.8 Model Fitting
5.3.7.9 Ontology and Pathways
5.3.8 Visualizing RNA-Seq Data
5.3.8.1 Visualizing Distribution with Boxplots
5.3.8.2 Scatter Plot
5.3.8.3 Mean-Average Plot (MA Plot)
5.3.8.4 Volcano Plots
5.4 Summary
References
Chapter 6 ◾ Chromatin Immunoprecipitation Sequencing
6.1 Introduction to Chromatin Immunoprecipitation
6.2 ChIP Sequencing
6.3 ChIP-Seq Analysis Workflow
6.3.1 Downloading the Raw Data
6.3.2 Quality Control
6.3.3 ChIP-Seq and Input Read Mapping
6.3.4 ChIP-Seq Peak Calling with MACS3
6.3.5 Visualizing ChIP-Seq Enrichment in Genome Browser
6.3.6 Visualizing Peaks Distribution
6.3.6.1 ChIP-Seq Peaks' Coverage Plot
6.3.6.2 Distribution of Peaks in Transcription Start Site (TSS) Regions
6.3.6.3 Profile of Peaks Along Gene Regions
6.3.7 Peak Annotation
6.3.7.1 Writing Annotations to Files
6.3.8 ChIP-Seq Functional Analysis
6.3.9 Motif Discovery
6.4 Summary
References
Chapter 7 ◾ Targeted Gene Metagenomic Data Analysis
7.1. Introduction to Metagenomics
7.2 Analysis Workflow
7.2.1 Raw Data Preprocessing
7.2.2 Metagenomic Features
7.2.2.1 Clustering
7.2.2.2 Denoising
7.2.3 Taxonomy Assignment
7.2.3.1 Basic Local Alignment Search Tool
7.2.3.2 VSEARCH
7.2.3.3 Ribosomal Database Project
7.2.4 Construction of Phylogenetic Trees
7.2.5 Microbial Diversity Analysis
7.2.5.1 Alpha Diversity Indices
7.2.5.2 Beta Diversity
7.3 Data Analysis with QIIME2
7.3.1 QIIME2 Input Files
7.3.1.1 Importing Sequence Data
7.3.1.2 Metadata
7.3.2 Demultiplexing
7.3.3 Downloading and Preparing the Example Data
7.3.3.1 Downloading the Raw Data
7.3.3.2 Creating the Sample Metadata File
7.3.3.3 Importing Microbiome Yoga Data
7.3.4 Raw Data Preprocessing
7.3.4.1 Quality Assessment and Quality Control
7.3.4.2 Clustering and Denoising
7.3.5 Taxonomic Assignment with QIIME2
7.3.5.1 Using Alignment-Based Classifiers
7.3.5.2 Using Machine Learning Classifiers
7.3.6 Construction of Phylogenetic Tree
7.3.6.1 De Novo Phylogenetic Tree
7.3.6.2 Fragment-Insertion Phylogenetic Tree
7.3.7 Alpha and Beta Diversity Analysis
7.4 Summary
References
Chapter 8 ◾ Shotgun Metagenomic Data Analysis
8.1 Introduction
8.2 Shotgun Metagenomic Analysis Workflow
8.2.1 Data Acquisition
8.2.2 Quality Assessment and Processing
8.2.3 Removing Host DNA Reads
8.2.3.1 Download Human Reference Genome
8.2.3.2 Mapping Reads to the Reference Genome
8.2.3.3 Converting SAM to BAM Format
8.2.3.4 Separating Metagenomic Reads in BAM Files
8.2.3.5 Creating Paired-End FASTQ Files From BAM Files
8.2.4 Assembly-Free Taxonomic Profiling
8.2.4 Assembly of Metagenomes
8.2.5 Assembly Evaluation
8.2.6 Mapping Reads to the Assemblies
8.2.7 Binning
8.2.8 Bin Evaluation
8.2.9 Prediction of Protein-Coding Region
8.3 Summary
References
Index