Practical Bioinformatics for Beginners: From Raw Sequence Analysis to Machine Learning Applications

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Next-Generation Sequencing (NGS) is increasingly common and has applications in various fields such as clinical diagnosis, animal and plant breeding, and conservation of species. This incredible tool has become cost-effective. However, it generates a deluge of sequence data that requires efficient analysis. The highly sought-after skills in computational and statistical analyses include machine learning and, are essential for successful research within a wide range of specializations, such as identifying causes of cancer, vaccine design, new antibiotics, drug development, personalized medicine, and increased crop yields in agriculture. This invaluable book provides step-by-step guides to complex topics that make it easy for readers to perform specific analyses, from raw sequenced data to answer important biological questions using machine learning methods. It is an excellent hands-on material for lecturers who conduct courses in bioinformatics and as reference material for professionals. The chapters are standalone recipes making them suitable for readers who wish to self-learn selected topics. Readers gain the essential skills necessary to work on sequenced data from NGS platforms; hence, making themselves more attractive to employers who need skilled bioinformaticians.

Author(s): Lloyd Wai Yee Low, Martti Tapani Tammi
Publisher: World Scientific
Year: 2023

Language: English
Pages: 267
City: Singapore

Contents
Foreword from the First Edition
Foreword from the First Edition
Preface
Acknowledgements
Chapter 1 Introduction to Next Generation Sequencing Technologies
A Brief History of DNA Sequencing
Next Generation Sequencing Technologies
454
ABI SOLiD
Illumina
Ion Torrent
Pacific Biosciences
Oxford Nanopore Technologies
Informatics Challenges
References
Chapter 2 Primer on Linux
Introduction
Listing the Contents of a Directory
Create Directory
Print Working Directory
Change Directory
Download Data
File Compression
Display the Contents of a File
Count the Number of Lines
Search a Pattern
Combine Multiple Commands Together
Converting a FASTQ File into a Tabular Format
Pattern Matching Using Awk
Sort and Extract Unique Sequences
Convert Reads into FASTA Format Sequences
Write a Shell Script to Split Sequences into Individual Files
Changing File Permissions
Run the Bash Script
Summary
Chapter 3 Inspection of Sequence Quality
Introduction
FastQC
Installation Step in Linux Environment
Download Datasets
Fastx-toolkit & FASTQ Processing Utilities
Installation Step in Linux Environment
Conclusion
References
Chapter 4 Alignment of Sequenced Reads
Introduction
Practical
Short Reads Alignment
Dataset
Software Requirements
Alignment Process
SAM to BAM conversion
Sort BAM alignments
Alternative: novoAlign & novoSort
View BAM alignment with IGV
References
Chapter 5 Establish a Research Workflow
Introduction
Materials
Shell Scripts
Galaxy
Conclusion
References
Chapter 6 De novo Assembly of a Genome
Introduction
Overall Steps
Download Sequences
Filter Out Bad Reads
Assemble the Genome(s)
Hybrid assembly with PacBio reads
Long SE read assembly (PacBio)
Long SE read assembly (Oxford Nanopore)
Check the Quality of the Genome
Discussion and Conclusion
References
Chapter 7 Exome Sequencing
Introduction
General Workflow of WES
Background Information on the Practical
Software
Datasets
Download Datasets
Creating a New Folder
Mapping of Raw Data to the Reference Genome
Variants Calling
Prediction of SNVs and Indels Effects
Visualization
Conclusion
References
Chapter 8 Transcriptomics
Introduction
Practical
Datasets & Software
Dataset
Software Required
Reads Pre-processing & Quality Control (QC)
Prepare Files
Perform Initial QC
Trimming for Bad Quality and Adapters
Run FastQC on Trimmed Reads
HISAT2: Reads Alignment
Prepare Files
Generate Genome Index
Reads Alignment
Single Sample Expression
Gene Expression Count Using HTSEQ
Gene Expression Count Using featureCounts
Gene Expression Count StringTie
Differential Expression
edgeR & Limma for Counts Data
Perform edgeR and limma and linear model
References
Chapter 9 Metagenomics
Introduction
Introduction to MG-RAST Server Workflow
Registration to MG-RAST
Submission of Dataset
Job Status Monitor
Data Analysis and Result Viewing
Analysis of Shotgun Metagenomic Sequence Datasets
Getting Started
Uploading and Submission
Results
Analysis of 16S rRNA-targeted Metagenomic Sequence Datasets
Getting Started
Visualization
Conclusion
References
Chapter 10 Applications of NGS Data
Introduction
Classical Linkage Map
OneMap (2.1.3)
Installation
Input formatting
Linkage mapping analysis
From Linkage Map to Physical Map
Genome-wide Association Studies (GWAS)
PLINK 1.90 Beta
Installation
Input files and format
Download datasets
Association analysis
Summary
References
Chapter 11 Predicting Human Enhancers with Machine Learning
Introduction
Setting up the Software Environment
Transforming the Data
One-hot encoding with a simple convolutional neural network (CNN)
References
Index