Metagenomic Data Analysis

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This volume describes different sequencing methods, pipelines and tools for metagenome data analyses. Chapters guide readers through quality control of raw sequence data, metagenomics databases for bacterial annotations such Greengenes, SILVA, RDP and GTDB, guide to 16S rRNA microbiome analysis and pipelines such as mothur, DADA2, QIIME2 , whole genome shotgun metagenomics data analyses pipeline using MEGAN and DIAMOND, web service such as PATRIC, RDP, mothur, Kaiju, PhyloPythiaS, MG-RAST, WebMGA, MicrobiomeAnalyst, WHAM!, METAGENassist and MGnify: EBI-Metagenomics, MG-RAST Metagenomics Analysis. Then the chapters inform the readers regarding Third-generation sequencing (TGS) approaches as MinION sequencing and teaches use of Ubuntu Linux Virtual Machine configuration, clinical and environmental resistomes, use of FISH techniques and designing FISH probes, protocols for viral metagenomics, and comprehensive guideline for microbiome analysisusing most used R packages. Written in the format of the highly successful Methods in Molecular Biology series, each chapter includes an introduction to the topic, lists necessary materials and methods, includes tips on troubleshooting and known pitfalls, and step-by-step, readily reproducible protocols.

 

Authoritative and cutting-edge, Metagenomic Data Analysis: Methods and Protocols aims to be comprehensive guide for researchers to specialize in the metagenomics field.

 

Author(s): Suparna Mitra
Series: Methods in Molecular Biology, 2649
Publisher: Humana Press
Year: 2023

Language: English
Pages: 442
City: New York

Preface
Contents
Contributors
Chapter 1: From Genomics to Metagenomics in the Era of Recent Sequencing Technologies
1 From Microbial Genomics to Metagenomics
2 Metagenomic Applications
2.1 Metagenomic Sequencing: Development, Applications, and Techniques
2.2 Bacterial Analysis Using 16S Sequencing
2.3 Shotgun Sequencing
2.4 Investigation of Eukaryotic Microbes
2.5 Eukaryotic Sequencing: Fungal Organisms
2.5.1 Utilization of ITS for Fungal Sequencing
2.5.2 Alternative Methods of Fungal Sequencing
2.6 Microbial Biodiversity
2.7 Rarefaction Curves: Importance and Application
3 Next-Generation Sequencing Technology
3.1 Overview of Short-Read Sequencing Techniques
3.2 Massively Parallel Sequencing
3.3 454 Sequencing (Roche)
3.4 Polony Sequencing
3.5 Illumina Technology (Solexa)
3.5.1 NextSeq 500/550
3.5.2 NovaSeq 6000
3.6 SOLiD (Life Technologies)
3.7 Ion Torrent (Life Technologies)
3.8 c-PAS Sequencing (Complete Genomics)
3.9 DNA Nanoball Sequencing (Complete Genomics)
3.10 Helicos SMS (Helicos Biosciences)
4 Third-Generation Technology: Progression from Short-Read Sequencing
4.1 Single-Molecular Real-Time Sequencing (Pacific Biosciences)
4.2 Nanopore Sequencing (Oxford Nanopore Technologies)
References
Chapter 2: Quality Control in Metagenomics Data
1 Introduction
2 Considerations in Study Design and Methodology
3 Solutions to Support Reproducibility
4 Code Walkthrough Introduction
5 Downloading SRA Project Data
6 Ensuring Data Integrity
7 Quality Control Statistics
7.1 Basic Statistics
7.2 Per Base Sequence Quality
7.3 Per Tile Sequence Quality
7.4 Per Sequence Quality Scores
7.5 Per Base Sequence Content
7.6 Per Sequence GC Content
7.7 Per Base N Content
7.8 Sequence Length Distribution
7.9 Sequence Duplication Levels
7.10 Overrepresented Sequences
7.11 Adapter Content
7.12 K-Mer Content
7.13 Scaling Quality Control to Multiple Samples
8 Trimming and Filtering Reads
9 Removing Host Derived Content
10 Taxonomic Classification
11 Data Handling, Visualization, and Comparative Analysis
12 Best Practices in Dimensionality Reduction
13 Contamination and Ensuring Reliable Classifications
13.1 Likelihood of Contamination
13.2 Likelihood of False Positive Classification
13.3 Significance of Taxa Detection
13.4 Strength of Computational Evidence
14 Conclusion
References
Chapter 3: Metagenomics Databases for Bacteria
1 Bacterial Metagenomics
2 Introduction to Bacterial Metagenomics Database
3 Greengenes
4 SILVA
5 Ribosomal Database Project
6 Genome Taxonomy Database
7 Conclusions
References
Chapter 4: Amplicon Sequencing Pipelines in Metagenomics
1 Introduction to Amplicon Sequencing
2 The Pipeline for Amplicon Sequencing
3 Data Generation
4 Installation of Packages
5 Data Preprocessing
6 mothur 16S rRNA Amplicon Sequencing Data Analysis Pipeline
6.1 Download the Reference Data
6.2 Start Mothur Environment
6.3 Assembly of Paired Reads and Quality Control
6.4 Sequence Alignment and Quality Control
6.5 Advanced Quality Control
6.6 Taxonomic Analysis
6.7 Diversity Analysis
7 DADA2 16S rRNA Amplicon Sequencing Data Analysis Pipeline
7.1 Download the Reference Data
7.2 DADA2 Standard Sequence Denoising Procedure
7.3 Assembly of Paired-End Reads and Generation of ASV Table
7.4 Taxonomic Analysis
7.5 Diversity Analysis
8 Notes
References
Chapter 5: A Practical Guide to 16S rRNA Microbiome Analysis in Musculoskeletal Disorders
1 Introduction
2 Materials
2.1 FASTQ Files
2.1.1 Interpreting FASTQ Files
2.2 Metadata File
2.3 Manifest File
2.4 Software and Computing Needs
2.4.1 MEGAN
2.4.2 LefSe
3 Methods
3.1 Demultiplexing Samples
3.2 QIIME 2
3.2.1 Directory Setup
3.2.2 Importing Demultiplexed FASTQ
3.2.3 Summarize the Demultiplexed Data and Review Quality
3.2.4 Denoise Using DADA2
3.2.5 Train RDP Classifier
3.2.6 Assign Taxonomy
3.3 MEGAN
3.4 LEfSe
4 Notes
References
Chapter 6: DIAMOND + MEGAN Microbiome Analysis
1 Introduction
2 Materials
2.1 Datasets
2.1.1 Short-read Samples
2.1.2 Long-read Samples
2.2 Software and Databases
2.3 Computational Resources
3 Methods
3.1 DIAMOND
3.1.1 DIAMOND Index
3.1.2 Short-read Alignment
3.1.3 Long-read Alignment
3.2 Meganization
3.3 Alignment and Meganization for Very Large Files
3.4 Interactive Analysis Using MEGAN
3.4.1 Tree Layout
3.4.2 Algorithm Parameters
3.4.3 Charts
3.5 Functional Analysis
3.5.1 Read Inspection
3.5.2 Alignment Viewer and Gene Centric Assembly
3.6 Comparative Analysis
3.7 Analyzing Long Reads
3.8 DIAMOND + MEGAN Analysis Using the AnnoTree Database
3.9 Megan-server
References
Chapter 7: Interactive Web-Based Services for Metagenomic Data Analysis and Comparisons
1 Introduction
2 Question One: Who Is There?
2.1 BV-BRC
2.2 RDP
2.2.1 RDP Classifier
2.2.2 RDPipeline
2.3 mothur in Galaxy Platform
2.4 Kaiju
2.5 PhyloPythiaS
3 Question Two: What Do They Do?
3.1 MG-RAST
3.2 WebMGA
4 Question Three: Are There Any Functional Correlations Between Microorganisms in a Particular Biome?
4.1 MicrobiomeAnalyst
4.2 WHAM!
5 Question Four: How Similar or Different Are Biomes from Each Other?
5.1 METAGENassist
6 Comprehensive Metagenomic Analysis
6.1 MGnify: EBI-Metagenomics
6.2 MGnify Data Workflow
6.3 Amplicon Analysis Pipeline
6.4 Metagenomic and Transcriptomic Raw Reads Pipeline
6.5 Assembly Pipeline
7 Challenges Using Web-Based Tools
8 Conclusion
References
Chapter 8: Application of High-Throughput Sequencing (HTS) to Enhance the Well-Being of an Endangered Species (Malayan Tapir):...
1 Introduction
2 Materials and Methods
2.1 Data Preprocessing
2.2 Uploading Sequences into MG-RAST
2.3 Submission of Sequences for Annotation
2.4 Post Sequence Submission
2.5 Analyzing the Annotated Dataset
2.6 Statistical Analysis
2.7 Data Visualization: Rarefaction
2.8 Stacked Bar Charts
3 Discussion
4 Notes
References
Chapter 9: Designing Knowledge-Based Bioremediation Strategies Using Metagenomics
1 Introduction
1.1 Next-Gen Sequencing Platforms
1.2 Comparative Metagenomics
2 Methods
2.1 MG-RAST Analysis
2.2 Taxonomic and Functional Gene Analysis
2.2.1 Data and Database Selection
2.2.2 Default Parameters
2.2.3 Selection of Features
2.2.4 Visualization Tools
2.2.5 MG-RAST Plugins
2.3 Annotation Systems, Data Normalization, and Validation
3 Methods
3.1 Case Study 1: Enhancing Biodegradation Process Efficiency by In Silico Analysis
3.2 Case Study 2: Comparative Metagenomics for Understanding the Impact of Seasonal Shifts on WWT Process Efficiency
3.3 Case Study 3: Soil Metagenomics
4 Notes
References
Chapter 10: Nanopore Sequencing Techniques: A Comparison of the MinKNOW and the Alignator Sequencers
1 Introduction
1.1 Sequencing History
1.2 MinION
2 Materials
2.1 Hardware and Software
2.2 RNA Isolation and Poly-A Enrichment
2.3 Library Preparation
3 Methods
3.1 RNA Extraction
3.2 Poly-A Enrichment Using NEBNext Poly(A) mRNA Magnetic Isolation Module (E7490)
3.3 Library Preparation and Direct RNA Sequencing Using MinION
3.4 Principals of Alignment Using MinKNOW
3.5 Principals of Alignment Using Alignator
4 Alignment of Nanopore Reads to Human cDNA Database Using the Alignator v1
4.1 Normalization of Read Counts
4.2 Gene Set Enrichment Analysis
5 Notes
References
Chapter 11: MAIRA: Protein-based Analysis of MinION Reads on a Laptop
1 Introduction
2 Materials
2.1 Hardware
2.2 Software
2.3 Datasets
3 Methods
3.1 Real-time Analysis
3.1.1 Main Graphical User Interface
3.1.2 Analysis Setup
3.1.3 Genus Identification
3.1.4 Species Identification
3.1.5 Virulence Factors and Antibiotic Resistance Genes
3.1.6 Exporting Data
3.1.7 Loading Files
3.1.8 Controls and Filters
3.2 Command-line Mode
3.2.1 Running Analysis
3.2.2 Exporting Data
3.2.3 Building New Databases
References
Chapter 12: Recovery and Analysis of Long-Read Metagenome-Assembled Genomes
1 Introduction
2 Materials
2.1 Data Collection
2.2 Software and Environment
3 Methods
3.1 Basecalling and Adapter Trimming in Long Reads
3.2 Quality Assessment of Raw Short Reads
3.3 Quality Trimming and Adapter Removal in Short Reads
3.4 Metagenome Assembly
3.4.1 Short-Read Assembly
3.4.2 Long-Read Assembly
3.5 Estimating the Coverage of Assembled Contigs Using Short Reads and Long Reads
3.6 Metagenome Binning of Contigs Assembled Using Short Reads
3.7 Quality Assessment of Recovered Genomes Bins
3.8 Taxonomic Classification of Recovered Genomes Bins
3.9 Error Correction of Long-Read Sequence
3.9.1 Frameshift Correction (MEGAN-LR)
3.9.2 Racon
3.9.3 Medaka
3.10 Comparative Analysis of Short- and Long-Read Assemblies
3.11 Gene Quality Assessment in Recovered Genomes
4 Notes
References
Chapter 13: Cloud Computing for Metagenomics: Building a Personalized Computational Platform for Pipeline Analyses
1 Introduction
2 Materials
3 Methods
3.1 Log into the Azure Portal
3.2 Select and Set Up a Virtual Machine (VM)
3.3 Add a Data Disk to the VM (Optional)
3.4 Logging into the Virtual Machine (VM) for Further Configuration
3.4.1 Connecting for the First Time
3.4.2 Apply Security Updates
3.4.3 Download and Install the Miniconda Python Distribution
3.4.4 Install QIIME2
3.4.5 Install and Set Up Jupyter Lab
3.4.6 Configuring the VM to Access the Data Disk (Optional)
3.5 Connect to Jupyter Lab Running on the VM Through Your Web Browser
3.6 Disconnect and Shut Down the VM
4 Further Exploration
5 Notes
References
Chapter 14: Artificial Intelligence in Medicine: Microbiome-Based Machine Learning for Phenotypic Classification
1 Introduction
2 Materials
3 Methods
3.1 Preparation of Microbiome Datasets
3.2 Machine Learning Classification
3.2.1 Dataset Preparation (See Note 2)
3.2.2 Machine Learning Modeling
4 Summary
5 Notes
References
Chapter 15: Tracking Antibiotic Resistance from the Environment to Human Health
1 Introduction
2 Environmental Resistome
3 Clinical Resistome
4 The Overlap Between Clinical and Environmental Resistome
5 Whole-Genome Sequencing in Detection and Control of Antimicrobial Resistance
6 Resistome Analysis Tools
7 Resistome Databases
8 Tools
9 Conclusions
References
Chapter 16: Targeted Enrichment of Low-Abundance and Uncharacterized Taxon Members in Complex Microbial Community with Primer-...
1 Introduction
2 Materials
3 Methods
3.1 Probe Design from Next Generation Sequencing Datasets
3.2 Evaluation of in Silico Specificity and Coverage of Probes
3.3 Sample Fixation
3.4 Fluorescent In Situ Hybridization and Microscopic Imaging
3.5 Optimizing Parameters for Fluorescent In Situ Hybridization (FISH)
3.6 Image Analysis: Quantitative FISH
3.7 Fixation-Free and In-Solution FISH for Fluorescence-Activated Cell Sorting (FACS)
3.8 Quality Check of Sorted Samples
3.9 Downstream Bioinformatic Analysis
4 Notes
References
Chapter 17: Assembly and Annotation of Viral Metagenomes from Short-Read Sequencing Data
1 Introduction
2 Materials
2.1 Hardware
2.2 Software
2.3 Sequences
2.4 Viral Databases
3 Methods
3.1 Read Quality Control, Adapter Trimming, and Decontamination
3.2 Contig Assembly
3.3 Viral Sequence Identification
3.4 Mapping to Reference Databases
4 Notes
References
Chapter 18: Manipulating and Basic Analysis of Tabular Metagenomics Datasets Using R
1 Introduction
2 R Language
3 Reading and Manipulating Tabular Data
3.1 Base R
3.2 Readr and the Tidyverse
4 Basic Analysis of Tabular Data
5 Summary
References
Chapter 19: Metagenomics Data Visualization Using R
1 Introduction
2 Visualization Options Within R
2.1 Base R
2.2 ggplot2
3 Common Comparative Visualization for Metagenomic Data
3.1 Alpha (α) Diversity and Beta (β) Diversity
4 Conclusions
References
Chapter 20: Comprehensive Guideline for Microbiome Analysis Using R
1 Introduction
2 Phyloseq
2.1 Application
3 MegaR
3.1 Application
4 DADA2
4.1 Pipeline workflow and functions
5 Metacoder
5.1 Application
6 MicrobiomeExplorer
6.1 Application
References
Index