Multiple Sequence Alignment: Methods and Protocols

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Author(s): Kazutaka Katoh
Series: Methods in Molecular Biology, 2231
Publisher: Humana
Year: 2020

Language: English
Pages: 321
City: New York

Preface
Introduction
Basic Concepts and Terms
Progressive Method
Iterative Refinement
Consistency
Specific or Newly Emerging Problems
Contents
Contributors
Part I: Basic Tools for Computing MSAa
Chapter 1: The Clustal Omega Multiple Alignment Package
1 Introduction
2 Materials
3 Methods
3.1 Basic Multiple Sequence Alignment
3.2 External Profile Alignment (EPA)
3.3 Iteration
3.4 Profile Alignment
4 Notes
References
Chapter 2: Phylogeny-Aware Alignment with PRANK and PAGAN
1 Introduction
2 Evolutionary Homology in Sequence Alignment
3 Phylogeny-Aware Alignment
4 Limitations of the Phylogeny-Aware Algorithm in PRANK
5 Phylogeny-Aware Alignment of Sequence Graphs with PAGAN
6 Phylogeny-Aware Alignment in Phylogenetic Analyses
7 Practical Advice for the Use of PRANK
8 Practical Advice for the Use of PAGAN
9 Future Directions
References
Chapter 3: Fast and Accurate Multiple Sequence Alignment with MSAProbs-MPI
1 Introduction
2 MSAProbs Method
3 Parallel Implementation in MSAProbs-MPI
3.1 Target Hardware
3.2 OpenMP
3.3 Message Passing Interface (MPI)
3.4 Parallel Approach
4 Execution of MSAProbs-MPI
4.1 Options for the Bioinformatics Method and the Parallel Implementation
4.2 Installation Instructions
4.3 Execution Example
References
Part II: Tools for Specific MSA Problems
Chapter 4: Aligning Protein-Coding Nucleotide Sequences with MACSE
1 Introduction
2 MACSE Basic Usage and Possible Troubleshooting
2.1 Getting Started
2.2 Obtaining Suitable Input Sequences
2.3 Most Common Usages
3 MACSE-Based Pipelines Suitable for Datasets of Various Sizes
3.1 Pipelines Based on MACSE as Singularity Containers
3.2 Basic Pipelines and Batch Facilities
3.3 Aligning Dozens of Sequences
3.4 Aligning Hundreds of Sequences
3.5 Aligning Thousands of Sequences
3.6 Metabarcoding Applications
4 Conclusion
References
Chapter 5: Cooperation of Spaln and Prrn5 for Construction of Gene-Structure-Aware Multiple Sequence Alignment
1 Introduction
2 Methods
2.1 Outline of Spaln Algorithm
2.1.1 Genome Mapping
2.1.2 HSP Construction and Chaining
2.1.3 Merging HSPs to Generate Full Gene Structure
2.1.4 Spliced Alignment
2.2 Installation and Execution of Spaln
2.2.1 Installation
2.2.2 Format of Genomic or Database Sequence
2.2.3 Four Running Modes of Spaln
2.2.4 Examples
2.3 Outline of Prrn5 Algorithm
2.3.1 Guide Tree or Guide Forest
2.3.2 Objective Function and Group-to-Group Alignment
2.4 Installation and Execution of Prrn5
3 Notes
References
Chapter 6: Multiple Sequence Alignment Computation Using the T-Coffee Regressive Algorithm Implementation
1 Introduction
2 Materials
2.1 Equipment Setup
2.2 Procedure
2.2.1 Binary
2.2.2 Compilation from Source
2.2.3 Docker
2.2.4 Conda
3 Methods
3.1 Validated Method Combinations
3.1.1 Fast and Accurate
3.1.2 Slower and More Accurate
3.1.3 Very Fast and Approximate
3.1.4 Further Method Combinations
4 Notes
References
Chapter 7: Multiple Sequence Alignment for Large Heterogeneous Datasets Using SATé, PASTA, and UPP
1 Introduction
2 SATé and PASTA
2.1 Iterative Divide-and-Conquer Strategy
2.2 PASTA Parameters
2.3 PASTA Output
2.4 Websites for PASTA
3 UPP
3.1 Ensembles of Profile Hidden Markov Models (HMMs)
3.2 UPP´s Algorithmic Protocol
3.3 UPP´s Parameters
3.4 Websites for UPP
4 Discussion and Summary
5 Notes
References
Chapter 8: Sequence Comparison Without Alignment: The SpaM Approaches
1 Introduction
2 Spaced Words
3 Filtered Spaced-Word Matches and Prot-SpaM
4 Read-SpaM: Estimating Phylogenetic Distances Based on Unassembled Sequencing Reads
5 The Most Recent Approaches: Multi-SpaM and Slope-SpaM
6 Back to Multiple Sequence Alignment
7 Software Availability
References
Chapter 9: lamassemble: Multiple Alignment and Consensus Sequence of Long Reads
1 Introduction
2 Access/Installation
3 Usage
3.1 Required Inputs for lamassemble
3.2 Command Line Usage
3.3 Going Faster by Multi-Threading
3.4 Multiple Sequence Alignment
4 How It Works
5 Missing Sequences
6 Bad Alignments
7 Notes
References
Chapter 10: Automated Removal of Non-homologous Sequence Stretches with PREQUAL
1 Introduction
2 Methods
2.1 Methodological Intuition
2.2 Statistical Approach
3 Program Usage
3.1 Download and Installation
3.2 Basic Usage
3.3 Using PREQUAL with DNA Sequences
3.4 Advanced Usage
3.4.1 Options Affecting the Definition of Core Regions and Filtering
3.4.2 Options Related to DNA Sequences
3.4.3 Options Affecting Output Formats
3.4.4 Options Affecting Posterior Probabilities and Filtering
4 Benchmarking
5 Notes
References
Chapter 11: Analysis of Protein Intermolecular Interactions with MAFFT-DASH
1 Introduction
2 NYN Domain-Containing RNases
3 Construction of N4BP1 NYN Domain Homology Model
4 Preparation of a Ranked List of NYN-Like Domains Using DASH
5 Extracting Putative Nucleotide Interactions from DASH Hits
6 Visualizing NYN-Nucleotide Interactions Along with Sequence Conservation or RNA-Binding Propensity
7 Extracting Putative Protein Interactions from DASH Hits
8 Conclusions
References
Chapter 12: Mustguseal and Sister Web-Methods: A Practical Guide to Bioinformatic Analysis of Protein Superfamilies
1 Introduction
2 Materials
2.1 Web-Browser
2.2 Plain Text Editor
2.3 3D-Structure Viewer
2.4 Sequence Alignment Editor
2.5 Perl Interpreter
3 Methods
3.1 Define the Diversity and Scope of Your Alignment
3.2 Enrich the Core 3D-Alignment with Data by Adding Sequences
3.3 Further Analysis by the Sister Web-Methods
4 Notes
References
Part III: Visualization
Chapter 13: Alignment of Biological Sequences with Jalview
1 Introduction
2 Materials
3 Methods
3.1 Import or Retrieve Sequence Data
3.2 Importing Coding Sequences (CDS) or Protein Products for CDS
3.3 Saving and Loading Project Files
3.4 Align Sequences
3.5 Evaluating Alignment Quality
3.6 Employing Multiple Views to Explore Different Aspects of an Alignment
3.7 Identification or Exclusion of Regions with Low Occupancy or Poor Reliability
3.8 Shading the Alignment to Reveal Conserved and Divergent Regions
3.9 Shading the Alignment According to Conservation Scores from the AACon Web Service
3.10 Group-Based Conservation Analysis with Phylogenetic Trees
3.11 Visualizing Group Conservation and Consensus
3.12 Alignment Figure Generation for Presentations and Papers
3.12.1 Preparing for Figure Export
3.12.2 ``Wrap mode´´: Formatting Alignments to Fit Within the Margins of a Page
3.12.3 EPS Export as ``Characters´´ or Line Art
3.13 Interactive Figure Export in HTML Web Pages
3.13.1 Export as an Interactive HTML Figure
3.13.2 Export Alignment for Visualization with BioJS-msaviewer
3.14 Automated Alignment Figure Generation in Batch Mode
3.14.1 Prepare a Custom Jalview Properties File
3.14.2 Running Jalview as a Command-Line Program
4 Notes
References
Chapter 14: Evolutionary Sequence Analysis and Visualization with Wasabi
1 Introduction
2 Overview of the User Interface
3 Example Workflow
3.1 Introduction
3.2 Setup
3.3 Instructions
3.3.1 Optional: Store a Data Snapshot
3.3.2 Optional: Realign with MAFFT
4 Advanced Topics
4.1 Under the Hood
4.2 Plugins
5 Future Directions
6 Notes
References
Chapter 15: Seaview Version 5: A Multiplatform Software for Multiple Sequence Alignment, Molecular Phylogenetic Analyses, and ...
1 Seaview and Its Context
2 The Alignment Window
2.1 Visualization of Multiple Sequence Alignments
2.2 Selecting Sequences and Sites
2.3 The ``Align´´ Menu
2.4 Protein-Coding DNA Sequences
2.5 Genetic Code Variants
2.6 Closely Related Sequences
2.7 The ``Trees´´ Menu of Alignment Windows
2.8 Parsimony-Based Tree Building
2.9 Distance-Based Tree Building
2.10 Tree Building by Maximum Likelihood
3 The Tree Window
4 The Help Window
5 Command-Line Mode
References
Chapter 16: NCBI Genome Workbench: Desktop Software for Comparative Genomics, Visualization, and GenBank Data Submission
1 NCBI Genome Workbench: Capabilities Overview
1.1 Introduction
1.2 Download and Install Genome Workbench
1.3 Data Import Capabilities
1.4 Data Privacy
1.5 Data Visualization
1.6 Data Presentation and High-Quality Printing
1.7 Tools
1.8 Genomic Data Editing Package
2 Getting Started with Genome Workbench
2.1 Main Window Look and Feel: Loading Data into the Project
2.2 Loading External (Non-NCBI) Annotations
2.3 Graphical Sequence View
2.4 Running Tools
3 Phylogenetic Analysis Using BLAST Search, Multiple Alignments, and Phylogenetic Tree View
4 Genome Workbench Submission Preparation
4.1 Getting Started: Enable Editing Package
4.2 Submission Preparation Workflow
4.3 Submission Process Explained
References
Part IV: Open Problems
Chapter 17: Revisiting Evaluation of Multiple Sequence Alignment Methods
1 Introduction
2 Overview
2.1 Terminology
2.2 Standard Alignment Criteria
2.3 Evolutionary Criteria
2.4 MSA Methods
3 Results from the Literature
3.1 Method Performance Depends on Specific Commands and Version Numbers
3.2 Impact of Dataset Properties
3.3 Alignment Criteria Rankings Differ
3.4 Alignment vs. Tree Accuracy
3.5 Simulated vs. Biological Datasets
3.6 Challenges in Using Simulations
3.7 Challenges in Protein Benchmarks
4 Conclusion
References
Index