This self-contained textbook covers fundamental aspects of sequence analysis in evolutionary biology, including sequence alignment, phylogeny reconstruction, and coalescent simulation. It addresses these aspects through a series of over 400 computer problems, ranging from elementary to research level, to enable learning by doing. Students solve the problems in the same computational environment used for decades in science – the UNIX command line. This is available on all three major operating systems for PCs: Microsoft Windows, Mac-OSX, and Linux. To learn using this powerful system, students analyze sample sequence data by applying generic tools, bioinformatics software, and over 40 programs specifically written for this course. The solutions for all problems are included, making the book ideal for self-study. Problems are grouped into sections headed by an introduction and a list of new concepts and programs. By using practical computing to explore evolutionary concepts and sequence data, the book enables readers to tackle their own computational problems.
Author(s): Bernhard Haubold, Angelika Börsch-Haubold
Edition: 1
Publisher: Springer
Year: 2018
Language: English
Pages: 331
Preface
Contents
1 The UNIX Command Line
1.1 Getting Started
1.2 Files
1.3 Scripts
1.3.1 Bash
1.3.2 Sed
1.3.3 AWK
2 Constructing and Applying Optimal Alignments
2.1 Sequence Evolution and Alignment
2.2 Amino Acid Substitution Matrices
2.2.1 Genetic Code
2.2.2 PAM Matrices
2.3 The Number of Possible Alignments
2.4 Dot Plots
2.5 Optimal Alignment
2.5.1 From Dot Plot to Alignment
2.5.2 Global Alignment
2.5.3 Local Alignment
2.6 Applications of Optimal Alignment
2.6.1 Homology Detection
2.6.2 Dating the Duplication of Adh
3 Exact Matching
3.1 Keyword Trees
3.2 Suffix Trees
3.3 Suffix Arrrays
3.4 Text Compression
3.4.1 Move to Front (MTF)
3.4.2 Measuring Compressibility: The Lempel–Ziv Decomposition
4 Fast Alignment
4.1 Alignment with k Errors
4.2 Fast Local Alignment
4.2.1 Simple BLAST
4.2.2 Modern BLAST
4.3 Shotgun Sequencing
4.4 Fast Global Alignment
4.5 Read Mapping
4.6 Clustering Protein Sequences
4.7 Position-Specific Iterated BLAST
4.8 Multiple Sequence Alignment
4.8.1 Query-Anchored Alignment
4.8.2 Progressive Alignment
5 Evolution Between Species: Phylogeny
5.1 Trees of Life
5.2 Rooted Phylogeny
5.3 Unrooted Phylogeny
6 Evolution Within Populations
6.1 Descent from One or Two Parents
6.1.1 Bi-Parental Genealogy
6.1.2 Uni-Parental Genealogy
6.2 The Coalescent
7 Additional Topics
7.1 Statistics
7.1.1 The Significance of Single Experiments
7.1.2 The Significance of Multiple Experiments
7.1.3 Mouse Transcriptome Data
7.2 Relational Databases
7.2.1 Mouse Expression Data
7.2.2 SQL Queries
7.2.3 Java
7.2.4 ENSEMBL
8 Answers and Appendix: Unix Guide
8.1 Answers
8.2 Appendix: UNIX Guide
8.2.1 File Editing
8.2.2 Working with Files
8.2.3 Entering Commands Interactively
8.2.4 Combining Commands: Pipes
8.2.5 Redirecting Output
8.2.6 Shell Scripts
8.2.7 Directories
8.2.8 Filters
8.2.9 Regular Expressions
9 Erratum to: Bioinformatics for Evolutionary Biologists
Erratum to: B. Haubold and A. Börsch-Haubold, Bioinformatics for Evolutionary Biologists, https://doi.org/10.1007/978-3-319-67395-0
References
Index