Bioinformatics: A Practical Guide to NCBI Databases and Sequence Alignments provides the basics of bioinformatics and in-depth coverage of NCBI databases, sequence alignment, and NCBI Sequence Local Alignment Search Tool (BLAST). As bioinformatics has become essential for life sciences, the book has been written specifically to address the need of a large audience including undergraduates, graduates, researchers, healthcare professionals, and bioinformatics professors who need to use the NCBI databases, retrieve data from them, and use BLAST to find evolutionarily related sequences, sequence annotation, construction of phylogenetic tree, and the conservative domain of a protein, to name just a few. Technical details of alignment algorithms are explained with a minimum use of mathematical formulas and with graphical illustrations.
Key Features
- Provides readers with the most-used bioinformatics knowledge of bioinformatics databases and alignments including both theory and application via illustrations and worked examples.
- Discusses the use of Windows Command Prompt, Linux shell, R, and Python for both Entrez databases and BLAST.
- The companion website contains tutorials, R and Python codes, instructor materials including slides, exercises, and problems for students.
This is the ideal textbook for bioinformatics courses taken by students of life sciences and for researchers wishing to develop their knowledge of bioinformatics to facilitate their own research.
Author(s): Hamid Ismail
Series: Chapman & Hall/CRC Computational Biology Series
Publisher: CRC Press
Year: 2021
Language: English
Pages: 360
City: Boca Raton
Cover
Half Title
Series Information
Title Page
Copyright Page
Table of Contents
Preface
Acknowledgments
Author bio
Chapter 1 The Origin of Genomic Information
Introduction
Genetic Information and Its Transmission
Structure of DNA and Genome
Gene Structure
Gene Regulatory Region
Introns
Exons
The Non-Coding Genomic Sequences
Repetitive Sequences
Pseudogenes
Telomeres
Ribonucleic Acid
Ribosomal RNA
Transfer RNA
MicroRNA
Messenger RNA
Messenger RNA Transcription
Messenger RNA Translation
The Proteins
Three-Dimensional Protein Structure
Protein Structure Representation and File Formats
PDB File Format
PDBx Or MmCIF File Format
Visualizing Protein Structure
Gene Mutations
Virus Genome
References
Chapter 2 The Sources of Genomic Data
Introduction
Extraction of DNA, RNA, and Proteins
DNA, RNA, and Protein Quality
Reverse Transcription
Polymerase Chain Reaction
First-Generation Sequencing
Maxam-Gilbert Sequencing
Sanger Chain-Terminator Sequencing Method
Sanger Dye-Terminator Sequencing Method
Next-Generation Sequencing
Library Preparation
TruSeq-style DNA Preparation
Nextera DNA Preparation
Enrichment Strategy
RNA-Sequencing Library Preparation
Multiplexing and Use of Barcodes
Sequencing By Synthesis
Single-end Read Sequencing
Paired-end Sequencing
Mate Pair Sequencing
Base Calling, Base Quality Score, and FASTQ File Format
Base Calling and Base Quality Score
FASTQ File Format
Downstream NGS Data Analysis
References
Chapter 3 The NCBI Entrez Databases
Introduction
Entrez Retrieval Web Interface
Entrez Database Web Page
Entrez Database Indexed Fields
Searching Entrez Databases
Entrez Search Results
Nucleotide Database
Submission of Sequence Data to GenBank
GenBank Format
Nucleotide Sub Databases
Expressed Sequence Tags
Genome Survey Sequences
Sequence-Tagged Sites
Reference Sequence
RefSeqGene
Searching Nucleotide Database
General Nucleotide Searching Strategy
Worked Examples
Gene Database
Biocollections
Protein Databases
Protein Database
Protein Clusters
Identical Protein Groups
Conserved Domain Database
Finding Conserved Domain of Proteins
Finding Protein Conserved Domains Using CD-Search
Finding Proteins With Similar Domain Architectures
Protein Family Models
Introduction to Protein Classification
Protein Family Models Database
Annotation of Prokaryotic Genome
HomoloGene
BioProject
BioSample
dbSNP
The DbSNP Submission
Searching the DbSNP Database
dbSNP Record Page
dbVar Database
Submission to DbVar Database
Structural Variant Representation
Searching DbVar
The DbVar Record Page
ClinVar Database
Submitting Data to ClinVar
Searching ClinVar Database
ClinVar Record Page
Gene Expression Omnibus (GEO) Database
Submitting Gene Expression Data to GEO
GEO DataSets
GEO DataSets Searching Examples
GEO Profiles
Searching GEO Profiles
Assembly
Assembly Submission
Searching the Assembly Database
Genome Database
Searching Genome Database
Browse By Organism
Human Genome
Genome Data Viewer
Genome Data Viewer Front Page
Components of Genome Data Viewer
Using Tools Menu
Using Tracks Menu
100 Genome Browser
Online Mendelian Inheritance in Man
Searching OMIM
Searching OMIM From Entrez Interface
Searching OMIM From Original OMIM Interface
OMIM Gene Record Page
OMIM Phenotype Record Page
PopSet
Searching PopSet Database
BioSystems
Searching BioSystems
Finding Genes and Proteins Involved in a Pathway
Finding a Pathway By a Gene
dbGaP
Submitting Data to DbGaP
Accessing DbGaP Data
Searching DbGaP
SRA Database
Submitting Data to SRA Database
Searching for SRA Data
Installing SRA Toolkit
Using SRA Toolkit Executables
Downloading SRA Files With Prefetch
Extracting Data From SRA Files
Extracting Data From DbGaP Non-SRA Files
Validation of Downloaded SRA Data Integrity
Taxonomy
NCBI Literature Databases
PubMed
Searching PubMed Database
PubMed Record Page
MeSH
PubMed Central
Bookshelf
References
Chapter 4 NCBI Entrez E-Utilities and Applications
Introduction
NCBI Application Programming Interface Key
NCBI History Server
E-utilities Programs
EInfo
Gquery
ESearch
EPost
ESummary
EFetch
ELink
ESpell
ECitMatch
References
Chapter 5 The Entrez Direct
Introduction
EDirect Installation
Installing EDirect On Linux and MacOS
Installing EDirect On Windows 10
Using EDirect
EInfo
ESearch
EFetch
ELink
EFilter
Epost
Nquire
Xtract
XML Structure of Entrez Database Records and Data Types
Identifying Patterns and Elements in an XML Document
Extracting XML Child Elements
Extracting XML Attribute
Formatting Columns
Grouping of Child Elements By Parents
Using Xtract Conditional Arguments With Elements
Using Xtract Conditional Arguments With String Constraints
Extracting Data From Docsum Format
Extracting Data From INSDSeq Format
Extending EDirect With Bash Shell Script
References
Chapter 6 R and Python Packages for the NCBI E-Utilities
Introduction
R Entrez Package
List the NCB Databases
Displaying a Database Summary
Displaying the Field List of a Database
Searching Entrez Databases
Getting Summary Data From Records
Extracting Summary Data From Records
Fetching Database Records
Listing Linked Databases and Records
Using BioPython for Entrez Databases
Installing Python
Installing Python and BioPython On Linux
Installing Python and BioPython On Windows
Using BioPython Entrez Package
Getting Information of Entrez Databases
Searching Entrez Databases
Uploading UIDs to the History Server
Retrieving Records’ Summaries
Fetching Data From Entrez Database Records
Finding Related Records in NCBI Databases
Printing Records Counts of All Entrez Databases
Providing Correction Suggestions
References
Chapter 7 Pairwise Sequence Alignment
Introduction
Pairwise Sequence Alignment
Global Sequence Alignment
Similarity Scores
Global Alignment Algorithm
Local Sequence Alignment and Algorithm
BLAST Algorithm
PSI-BLAST
References
Chapter 8 Basic Local Alignment Search Tool
Introduction
Web BLAST
BLAST Field Description
Using BLAST
Aligning Two Sequences
Aligning Multiple Sequences
Identifying a Nucleotide Sequence
Finding Closely Related Species
Identifying Unknown Bacteria Using 16S RRNA Sequence
Constructing Phylogenetic Tree for Human Related Species
Constructing Phylogenetic Tree Using Protein Sequences
Annotate a Metagenomic Contig
Finding Protein Conserved Domains With Solved Structures
Designing Primers for PCR
Stand-Alone BLAST
Installing and Running Local BLAST On Windows
Installing and Running Local BLAST On Linux
Building a BLAST Database
The Use of the Local BLAST
References
Index