Sequence Analysis and Modern C++ - The Creation of the SeqAn3 Bioinformatics Library

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Introduction This is a book about software engineering, bioinformatics, the C++ programming language and the SeqAn library. In the broadest sense, it will help the reader create better, faster and more reliable software by deepening their understanding of available tools, language features, techniques and design patterns. Every developer who previously worked with C++ will enjoy the in-depth chapter on important changes in the language from C++11 up to and including C++20. In contrast to many resources on Modern C++ that present new features only in small isolated examples, this book represents a more holistic approach: readers will understand the relevance of new features and how they interact in the context of a large software project and not just within a "toy example". Previous experience in creating software with C++ is highly recommended to fully appreciate these aspects. SeqAn3 is a new, re-designed software library. The conception and implementation process is detailed in this book, including a critical reflection on the previous versions of the library. This is particularly helpful to readers who are about to create a large software project themselves, or who are planning a major overhaul of an existing library or framework. While the focus of the book is clearly on software development and design, it also touches on various organisational and administrative aspects like licensing, dependency management and quality control. About the authors Hannes Hauswedell has a doctoral degree from Freie Universität Berlin and currently works as post-doctoral researcher and software engineer at deCODE Genetics. He is interested in programming languages, software design and scalability. He likes Open Source and Open Science.

Author(s): Hannes Hauswedell
Edition: 1
Publisher: Springer
Year: 2022

Language: English
Pages: 346
Tags: sequence analysis library design bioinformatics high performance computing sequence alignment local alignment seqanc ++17 c++20 c++ ranges biopython fm index lambda homology search open source sdsl blast

2022_Bookmatter_SequenceAnalysisAndModernC
Preface
Acknowledgements
Contents
2022_Bookmatter_SequenceAnalysisAndModernC (1)
Part I Background
Hauswedell2022_Chapter_SequenceAnalysis
1 Sequence Analysis
Hauswedell2022_Chapter_TheSeqAnLibraryVersions1And2
2 The SeqAn Library (Versions 1 and 2)
2.1 History
2.2 Design Goals
2.3 Programming Techniques
2.3.1 Generic Programming
2.3.2 Template Subclassing
2.3.3 Global Function Interfaces
2.3.4 Metafunctions
2.4 Discussion
2.4.1 Performance
2.4.2 Simplicity
Non-locality
Code Complexity and Feature Creep
Unconstrained Templates
Documentation
2.4.3 Generality, Refineability and Extensibility
2.4.4 Integration
Source-Code Level Integration
Project-Level Integration
2.4.5 Summary
Hauswedell2022_Chapter_ModernC
3 Modern C++
3.1 Type Deduction
3.1.1 The auto Specifier
Excursus: Lambda Expressions
3.1.2 Class Template Argument Deduction (CTAD)
3.2 Move Semantics and Perfect Forwarding
3.2.1 Move Semantics
3.2.2 Reference Types and Perfect Forwarding
3.2.3 Out-Parameters and Returning by Value
3.3 Metaprogramming and Compile-Time Computations
3.3.1 Metafunctions and Type Traits
3.3.2 Traits Classes
3.3.3 Compile-Time Computations
3.3.4 Conditional Instantiation
3.3.5 Standard Library Traits
3.4 C++ Concepts
3.4.1 Introduction
3.4.2 Defining Concepts
3.4.3 Using Concepts
3.4.4 Concepts-Based Polymorphism
3.4.5 Standard Library Concepts
3.5 Code Reuse
3.5.1 The Curiously Recurring Template Pattern (CRTP)
3.5.2 Metaclasses
3.6 C++ Ranges
3.6.1 Introduction
3.6.2 Range Traits and Concepts
3.6.3 The View Concept
3.6.4 Range Adaptor Objects
3.6.5 Standard Library Views
3.7 Customisation Points
3.7.1 Excursus: Calling Conventions
3.7.2 Introduction
3.7.3 ``Niebloids''
3.7.4 Future Standardisation
3.8 Concurrency & Parallelism
3.9 C++ Modules
3.10 Utility Types
3.11 Discussion
2022_Bookmatter_SequenceAnalysisAndModernC (2)
Part II SeqAn3
Hauswedell2022_Chapter_TheDesignOfSeqAn3
4 The Design of SeqAn3
4.1 Design Goals
4.1.1 Performance
4.1.2 Simplicity
4.1.3 Integration
4.1.4 Adaptability
4.1.5 Compactness
4.2 Programming Techniques
4.2.1 Modern C++
4.2.2 Programming Paradigms
4.2.3 Polymorphism and Customisation
4.2.4 Aspects of Object-Orientation
4.2.5 Ranges and Views
4.2.6 ``Natural'' Function Interfaces
4.2.7 constexpr if Possible
4.3 Administrative Aspects
4.3.1 Header-Only Library
4.3.2 Licence
4.3.3 Platform Support
Compiler
Operating System
Machine Architecture
4.3.4 Stability
API
ABI
Platform
4.3.5 Availability
4.3.6 Combining SeqAn2 and SeqAn3
4.4 Dependencies and Tooling
4.4.1 Library Dependencies
The C++20 Standard Library
The Succinct Data Structure Library
Cereal (Optional Dependency)
Lemon (Optional Dependency)
(De-)Compression Libraries (Optional Dependencies)
4.4.2 Documentation
4.4.3 Testing
Test Metrics
Implementation
Execution
Future
4.5 Project Management and Social Aspects
Hauswedell2022_Chapter_LibraryStructureAndSmallModule
5 Library Structure and Small Modules
5.1 Library Structure
5.1.1 Files and Directories
5.1.2 Modules and Submodules
5.1.3 Names and Namespaces
5.2 ``Small'' Modules
5.2.1 Argument Parser
5.2.2 The Core Module
5.2.3 The Utility Module
Operations on Characters
Excursus: Variadic Templates, Parameter Packs and Folds
Type Lists
5.2.4 The STD Module
5.2.5 The Contrib Module
5.3 Discussion
5.3.1 Performance
5.3.2 Simplicity
5.3.3 Integration
5.3.4 Adaptability
5.3.5 Compactness
Hauswedell2022_Chapter_TheAlphabetModule
6 The Alphabet Module
6.1 General Design
6.1.1 Character and Rank Representation
6.1.2 Function Objects and Traits
6.1.3 Concepts
6.2 User-Defined Alphabets and Adaptations
6.2.1 User-Defined Alphabets
6.2.2 Adapting Existing Types as Alphabets
6.3 The Nucleotide Submodule
6.3.1 General Design
6.3.2 Canonical DNA Alphabets
6.3.3 Canonical RNA Alphabets
6.3.4 Other Nucleotide Alphabets
6.4 The Amino Acid Submodule
6.4.1 General Design
6.4.2 Amino Acid Alphabets
6.4.3 Translation
6.5 Composite Alphabets
6.5.1 Alphabet Variants
6.5.2 Alphabet Tuples
6.5.3 Alphabet ``any'' Types
6.6 The Quality Submodule
6.6.1 General Design
6.6.2 Quality Alphabets
6.6.3 Quality Tuples
6.7 Discussion
6.7.1 Performance
6.7.2 Simplicity
6.7.3 Integration
6.7.4 Adaptability
6.7.5 Compactness
Hauswedell2022_Chapter_TheRangeModule
7 The Range Module
7.1 General Design
7.2 Container
7.2.1 Concepts
7.2.2 Bit-Compressed Container
7.2.3 Containers of Containers
7.2.4 Fixed-Capacity Containers
7.3 Views
7.3.1 General Design
7.3.2 Alphabet-Specific Views
7.3.3 Some General-Purpose Views
7.3.4 Implementation Notes
7.4 Discussion
7.4.1 Performance
Container
Views
7.4.2 Simplicity
7.4.3 Integration
7.4.4 Adaptability
7.4.5 Compactness
Hauswedell2022_Chapter_TheInputOutputModule
8 The Input/Output Module
8.1 The Stream Submodule
8.2 Serialisation
8.3 Formatted Files
8.3.1 Files and Formats
8.3.2 Records and Fields
8.4 The Sequence File Submodule
8.4.1 Input
8.4.2 Output
8.4.3 Combined Input and Output
8.4.4 Asynchronous Input/Output
8.5 Discussion
8.5.1 Performance
8.5.2 Simplicity
8.5.3 Integration
8.5.4 Adaptability
8.5.5 Compactness
Hauswedell2022_Chapter_TheSearchModule
9 The Search Module
9.1 The FM-Index Submodule
9.1.1 Unidirectional FM-Index
9.1.2 Bidirectional FM-Index
9.2 The k-Mer-Index Submodule
9.2.1 Shapes in SeqAn3
Excursus: Strong Types
Shapes
Hashing
9.3 General Algorithm Design
9.4 The (Search) Algorithm Submodule
9.4.1 Search Strategies
9.5 The Configuration Submodule
9.5.1 Excursus: Aggregate Initialisation and Designated Initialisers
9.5.2 Search Config Elements
9.6 Discussion
9.6.1 Performance
Search
k-Mers and Shapes
9.6.2 Simplicity
Indexes
Search
k-Mers and Shapes
9.6.3 Integration and Adaptability
9.6.4 Compactness
Hauswedell2022_Chapter_TheAlignmentModule
10 The Alignment Module
10.1 The Aligned Range Submodule
10.1.1 Concepts and Function Objects
10.1.2 Gap Decorators
10.2 The Scoring Submodule
10.2.1 Alphabet Scoring Schemes
10.2.2 The Gap (Scoring) Scheme
10.3 The Pairwise (Alignment) Submodule
10.3.1 Algorithm Interface
10.3.2 Alignment Result Type
10.3.3 Theoretical Background and Implementation Details
10.4 The Configuration Submodule
10.5 Discussion
10.5.1 Performance
10.5.2 Simplicity
Using the Gap Decorator
The Algorithm Interface
10.5.3 Integration
10.5.4 Adaptability
10.5.5 Compactness
2022_Bookmatter_SequenceAnalysisAndModernC (3)
Part III Lambda
Hauswedell2022_Chapter_LambdaAnApplicationBuiltWithSe
11 Lambda: An Application Built with SeqAn
11.1 Introduction
11.1.1 Previous Work
11.1.2 History of LAMBDA
11.2 Implementation
11.2.1 Index Creation
11.2.2 Search
11.3 Results
11.3.1 Notable Features
11.3.2 Performance
Lambda3's Parameter Space
Speed and Sensitivity Compared to Other Applications
11.4 Discussion
11.4.1 From SeqAn2 to SeqAn3
11.4.2 Algorithmic Choices
2022_Bookmatter_SequenceAnalysisAndModernC (4)
Part IV Conclusion and Appendix
Hauswedell2022_Chapter_Conclusion
12 Conclusion
2022_Bookmatter_SequenceAnalysisAndModernC (5)
Appendix A
A.1 Notes on Reading This Book
A.1.1 References and Hyperlinks
Links to External Resources and Websites
Cross-References Inside the Book
A.1.2 How to Read Code Snippets
Line-Numbering in Code Snippets
Syntax Highlighting
A.2 Software and Hardware Details
A.2.1 Benchmarking Environment
A.2.2 Helpful Software
A.3 Copyright
A.3.1 SeqAn Copyright
A.4 Longer Code Snippets
A.5 Detailed Benchmark Results (Local Aligners)
References