Biopython Tutorial and Cookbook (Updated Version 1.81)

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

The Biopython Tutorial and Cookbook contains the bulk of Biopython documentation. It provides information to get you started with Biopython, in addition to specific documentation on a number of modules

Author(s): Jeff Chang, Brad Chapman, Iddo Friedberg, Thomas Hamelryck, Michiel de Hoon, Peter Cock, Tiago Antao, Eric Talevich, Bartek Wilczynski
Publisher: dbooks.org
Year: 2023

Language: English
Pages: 374

Introduction
What is Biopython?
What can I find in the Biopython package
Installing Biopython
Frequently Asked Questions (FAQ)
Quick Start – What can you do with Biopython?
General overview of what Biopython provides
Working with sequences
A usage example
Parsing sequence file formats
Simple FASTA parsing example
Simple GenBank parsing example
I love parsing – please don't stop talking about it!
Connecting with biological databases
What to do next
Sequence objects
Sequences act like strings
Slicing a sequence
Turning Seq objects into strings
Concatenating or adding sequences
Changing case
Nucleotide sequences and (reverse) complements
Transcription
Translation
Translation Tables
Comparing Seq objects
Sequences with unknown sequence contents
Sequences with partially defined sequence contents
MutableSeq objects
Working with strings directly
Sequence annotation objects
The SeqRecord object
Creating a SeqRecord
SeqRecord objects from scratch
SeqRecord objects from FASTA files
SeqRecord objects from GenBank files
Feature, location and position objects
SeqFeature objects
Positions and locations
Sequence described by a feature or location
Comparison
References
The format method
Slicing a SeqRecord
Adding SeqRecord objects
Reverse-complementing SeqRecord objects
Sequence Input/Output
Parsing or Reading Sequences
Reading Sequence Files
Iterating over the records in a sequence file
Getting a list of the records in a sequence file
Extracting data
Modifying data
Parsing sequences from compressed files
Parsing sequences from the net
Parsing GenBank records from the net
Parsing SwissProt sequences from the net
Sequence files as Dictionaries
Sequence files as Dictionaries – In memory
Sequence files as Dictionaries – Indexed files
Sequence files as Dictionaries – Database indexed files
Indexing compressed files
Discussion
Writing Sequence Files
Round trips
Converting between sequence file formats
Converting a file of sequences to their reverse complements
Getting your SeqRecord objects as formatted strings
Low level FASTA and FASTQ parsers
Multiple Sequence Alignment objects
Parsing or Reading Sequence Alignments
Single Alignments
Multiple Alignments
Ambiguous Alignments
Writing Alignments
Converting between sequence alignment file formats
Getting your alignment objects as formatted strings
Manipulating Alignments
Slicing alignments
Alignments as arrays
Getting information on the alignment
Substitutions
Alignment Tools
ClustalW
MUSCLE
MUSCLE using stdout
MUSCLE using stdin and stdout
EMBOSS needle and water
Pairwise sequence alignment
Basic usage
The pairwise aligner object
Substitution scores
Affine gap scores
General gap scores
Using a pre-defined substitution matrix and gap scores
Iterating over alignments
Alignment objects
Aligning to the reverse strand
Examples
Generalized pairwise alignments
Substitution matrices
Creating an Array object
Calculating a substitution matrix from a pairwise sequence alignment
Reading Array objects from file
Loading predefined substitution matrices
Pairwise alignments using pairwise2
BLAST
Running BLAST over the Internet
Running BLAST locally
Introduction
Standalone NCBI BLAST+
Other versions of BLAST
Parsing BLAST output
The BLAST record class
Dealing with PSI-BLAST
Dealing with RPS-BLAST
BLAST and other sequence search tools
The SearchIO object model
QueryResult
Hit
HSP
HSPFragment
A note about standards and conventions
Reading search output files
Dealing with large search output files with indexing
Writing and converting search output files
Accessing NCBI's Entrez databases
Entrez Guidelines
EInfo: Obtaining information about the Entrez databases
ESearch: Searching the Entrez databases
EPost: Uploading a list of identifiers
ESummary: Retrieving summaries from primary IDs
EFetch: Downloading full records from Entrez
ELink: Searching for related items in NCBI Entrez
EGQuery: Global Query - counts for search terms
ESpell: Obtaining spelling suggestions
Parsing huge Entrez XML files
HTML escape characters
Handling errors
Specialized parsers
Parsing Medline records
Parsing GEO records
Parsing UniGene records
Using a proxy
Examples
PubMed and Medline
Searching, downloading, and parsing Entrez Nucleotide records
Searching, downloading, and parsing GenBank records
Finding the lineage of an organism
Using the history and WebEnv
Searching for and downloading sequences using the history
Searching for and downloading abstracts using the history
Searching for citations
Swiss-Prot and ExPASy
Parsing Swiss-Prot files
Parsing Swiss-Prot records
Parsing the Swiss-Prot keyword and category list
Parsing Prosite records
Parsing Prosite documentation records
Parsing Enzyme records
Accessing the ExPASy server
Retrieving a Swiss-Prot record
Searching Swiss-Prot
Retrieving Prosite and Prosite documentation records
Scanning the Prosite database
Going 3D: The PDB module
Reading and writing crystal structure files
Reading an mmCIF file
Reading files in the MMTF format
Reading a PDB file
Reading a PQR file
Reading files in the PDB XML format
Writing mmCIF files
Writing PDB files
Writing PQR files
Writing MMTF files
Structure representation
Structure
Model
Chain
Residue
Atom
Extracting a specific Atom/Residue/Chain/Model from a Structure
Disorder
General approach
Disordered atoms
Disordered residues
Hetero residues
Associated problems
Water residues
Other hetero residues
Navigating through a Structure object
Analyzing structures
Measuring distances
Measuring angles
Measuring torsion angles
Internal coordinates module - distances, angles, torsion angles, distance plots and more
Determining atom-atom contacts
Superimposing two structures
Mapping the residues of two related structures onto each other
Calculating the Half Sphere Exposure
Determining the secondary structure
Calculating the residue depth
Common problems in PDB files
Examples
Automatic correction
Fatal errors
Accessing the Protein Data Bank
Downloading structures from the Protein Data Bank
Downloading the entire PDB
Keeping a local copy of the PDB up to date
General questions
How well tested is Bio.PDB?
How fast is it?
Is there support for molecular graphics?
Who's using Bio.PDB?
Bio.PopGen: Population genetics
GenePop
Phylogenetics with Bio.Phylo
Demo: What's in a Tree?
Coloring branches within a tree
I/O functions
View and export trees
Using Tree and Clade objects
Search and traversal methods
Information methods
Modification methods
Features of PhyloXML trees
Running external applications
PAML integration
Future plans
Sequence motif analysis using Bio.motifs
Motif objects
Creating a motif from instances
Creating a sequence logo
Reading motifs
JASPAR
MEME
TRANSFAC
Writing motifs
Position-Weight Matrices
Position-Specific Scoring Matrices
Searching for instances
Searching for exact matches
Searching for matches using the PSSM score
Selecting a score threshold
Each motif object has an associated Position-Specific Scoring Matrix
Comparing motifs
De novo motif finding
MEME
Useful links
Cluster analysis
Distance functions
Calculating cluster properties
Partitioning algorithms
Hierarchical clustering
Self-Organizing Maps
Principal Component Analysis
Handling Cluster/TreeView-type files
Example calculation
Supervised learning methods
The Logistic Regression Model
Background and Purpose
Training the logistic regression model
Using the logistic regression model for classification
Logistic Regression, Linear Discriminant Analysis, and Support Vector Machines
k-Nearest Neighbors
Background and purpose
Initializing a k-nearest neighbors model
Using a k-nearest neighbors model for classification
Naïve Bayes
Maximum Entropy
Markov Models
Graphics including GenomeDiagram
GenomeDiagram
Introduction
Diagrams, tracks, feature-sets and features
A top down example
A bottom up example
Features without a SeqFeature
Feature captions
Feature sigils
Arrow sigils
A nice example
Multiple tracks
Cross-Links between tracks
Further options
Converting old code
Chromosomes
Simple Chromosomes
Annotated Chromosomes
KEGG
Parsing KEGG records
Querying the KEGG API
Bio.phenotype: analyze phenotypic data
Phenotype Microarrays
Parsing Phenotype Microarray data
Manipulating Phenotype Microarray data
Writing Phenotype Microarray data
Cookbook – Cool things to do with it
Working with sequence files
Filtering a sequence file
Producing randomized genomes
Translating a FASTA file of CDS entries
Making the sequences in a FASTA file upper case
Sorting a sequence file
Simple quality filtering for FASTQ files
Trimming off primer sequences
Trimming off adaptor sequences
Converting FASTQ files
Converting FASTA and QUAL files into FASTQ files
Indexing a FASTQ file
Converting SFF files
Identifying open reading frames
Sequence parsing plus simple plots
Histogram of sequence lengths
Plot of sequence GC%
Nucleotide dot plots
Plotting the quality scores of sequencing read data
Dealing with alignments
Calculating summary information
Calculating a quick consensus sequence
Position Specific Score Matrices
Information Content
Substitution Matrices
Using common substitution matrices
Calculating a substitution matrix from a multiple sequence alignment
BioSQL – storing sequences in a relational database
The Biopython testing framework
Running the tests
Running the tests using Tox
Writing tests
Writing a test using unittest
Writing doctests
Writing doctests in the Tutorial
Where to go from here – contributing to Biopython
Bug Reports + Feature Requests
Mailing lists and helping newcomers
Contributing Documentation
Contributing cookbook examples
Maintaining a distribution for a platform
Contributing Unit Tests
Contributing Code
Appendix: Useful stuff about Python
What the heck is a handle?
Creating a handle from a string