Discover over 80 recipes for modeling and handling real-life biological data using modern libraries from the R ecosystem
Key Features
Apply modern R packages to process biological data using real-world examples
Represent biological data with advanced visualizations and workflows suitable for research and publications
Solve real-world bioinformatics problems such as transcriptomics, genomics, and phylogenetics
Purchase of the print or Kindle book includes a free PDF eBook
Book Description
The updated second edition of R Bioinformatics Cookbook takes a recipe-based approach to show you how to conduct practical research and analysis in computational biology with R. You’ll learn how to create a useful and modular R working environment, along with loading, cleaning, and analyzing data using the most up-to-date Bioconductor, ggplot2, and tidyverse tools.
This book will walk you through the Bioconductor tools necessary for you to understand and carry out protocols in RNA-seq and ChIP-seq, phylogenetics, genomics, gene search, gene annotation, statistical analysis, and sequence analysis. As you advance, you'll find out how to use Quarto to create data-rich reports, presentations, and websites, as well as get a clear understanding of how machine learning techniques can be applied in the bioinformatics domain. The concluding chapters will help you develop proficiency in key skills, such as gene annotation analysis and functional programming in purrr and base R. Finally, you'll discover how to use the latest AI tools, including ChatGPT, to generate, edit, and understand R code and draft workflows for complex analyses.
By the end of this book, you'll have gained a solid understanding of the skills and techniques needed to become a bioinformatics specialist and efficiently work with large and complex bioinformatics datasets.
What you will learn
Set up a working environment for bioinformatics analysis with R
Import, clean, and organize bioinformatics data using tidyr
Create publication-quality plots, reports, and presentations using ggplot2 and Quarto
Analyze RNA-seq, ChIP-seq, genomics, and next-generation genetics with Bioconductor
Search for genes and proteins by performing phylogenetics and gene annotation
Apply ML techniques to bioinformatics data using mlr3
Streamline programmatic work using iterators and functional tools in the base R and purrr packages
Use ChatGPT to create, annotate, and debug code and workflows
Who this book is for
This book is for bioinformaticians, data analysts, researchers, and R developers who want to address intermediate-to-advanced biological and bioinformatics problems by learning via a recipe-based approach. Working knowledge of the R programming language and basic knowledge of bioinformatics are prerequisites.
Author(s): Dan MacLean
Edition: 2
Publisher: Packt Publishing Pvt Ltd
Year: 2023
Language: English
Pages: 396
R Bioinformatics Cookbook, Second Edition
Contributors
About the author
About the reviewer
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Share Your Thoughts
Download a free PDF copy of this book
1
Setting Up Your R Bioinformatics Working Environment
Technical requirements
Further information
Setting up an R project in a directory
Getting ready
How to do it…
How it works…
There’s more…
Using the here package to simplify working with paths
Getting ready
How to do it…
How it works…
There’s more…
Using the devtools package to work with the latest non-CRAN packages
Getting ready
How to do it…
How it works…
There’s more…
Setting up your machine for the compilation of source packages
Getting ready
How to do it…
How it works…
See also
Using the renv package to create a project-specific set of packages
Getting ready
How to do it…
How it works…
There’s more…
Installing and managing different versions of Bioconductor packages in environments
Getting ready
How to do it…
How it works…
Using bioconda to install external tools
swGetting ready
How to do it…
How it works…
2
Loading, Tidying, and Cleaning Data in the tidyverse
Technical requirements
Further information
Loading data from files with readr
Getting ready
How to do it…
How it works…
There’s more…
See also
Tidying a wide format table into a tidy table with tidyr
Getting ready
How to do it…
How it works…
See also
Tidying a long format table into a tidy table with tidyr
Getting ready
How it works…
There’s more…
Combining tables using join functions
Getting ready
How to do it…
How it works…
Reformatting and extracting existing data into new columns using stringr
Getting ready
How to do it…
How it works…
Computing new data columns from existing ones and applying arbitrary functions using mutate()
Getting ready
How to do it…
How it works…
Using dplyr to summarize data in large tables
Getting ready
How to do it…
How it works…
Using datapasta to create R objects from cut-and-paste data
Getting ready
How to do it…
How it works…
There’s more…
3
ggplot2 and Extensions for Publication Quality Plots
Technical requirements
Further information
Combining many plot types in ggplot2
Getting ready
How to do it…
How it works…
There’s more…
Comparing changes in distributions with ggridges
Getting ready
How to do it…
How it works…
Customizing plots with ggeasy
Getting ready
How to do it…
How it works…
There’s more…
Highlighting selected values in busy plots with gghighlight
Getting ready
How to do it…
How it works…
Plotting variability and confidence intervals better with ggdist
Getting ready
How to do it…
How it works…
Making interactive plots with plotly
Getting ready
How to do it…
How it works…
See also
Clarifying label placement with ggrepel
Getting ready
How to do it…
How it works…
Zooming and making callouts from selected plot sections with facetzoom
Getting ready
How to do it…
How it works…
Getting ready
How to do it…
How it works…
There’s more…
See also
4
Using Quarto to Make Data-Rich Reports, Presentations, and Websites
Technical requirements
Further information
Using Markdown and Quarto for literate computation
Getting ready
How to do it…
How it works…
There’s more…
Creating different document formats from the same source
Getting ready
How to do it…
How it works…
Creating data-rich presentations from code
Getting ready
How to do it…
How it works…
There’s more…
Creating websites from collections of Quarto documents
Getting ready
How to do it…
How it works…
There’s more…
See also
Adding interactivity with Shiny
Getting ready
How to do it…
How it works…
There’s more…
See also
5
Easily Performing Statistical Tests Using Linear Models
Technical requirements
Further information
Modeling data with a linear model
Getting ready
How to do it…
How it works…
There’s more…
Using a linear model to compare the mean of two groups
Getting ready
How to do it…
How it works…
There’s more…
Using a linear model and ANOVA to compare multiple groups in a single variable
Getting ready
How to do it…
How it works…
There’s more…
Using linear models and ANOVA to compare multiple groups in multiple variables
Getting ready
How to do it…
How it works…
Testing and accounting for interactions between variables in linear models
Getting ready
How to do it…
How it works…
Doing tests for differences in data in two categorical variables
Getting ready
How to do it…
How it works…
Making predictions using linear models
Getting ready
How to do it…
How it works…
See also
6
Performing Quantitative RNA-seq
Technical requirements
Further information
Estimating differential expression with edgeR
Getting ready
How to do it…
How it works…
Estimating differential expression with DESeq2
Getting ready
How to do it…
How it works…
There’s more...
Estimating differential expression with Kallisto and Sleuth
Getting ready
How to do it…
How it works…
Using Sleuth to analyze time course experiments
Getting ready
How to do it…
How it works…
Analyzing splice variants with SGSeq
Getting ready
How to do it…
How it works…
Performing power analysis with powsimR
Getting ready
How to do it…
How it works…
There’s more…
Finding unannotated transcribed regions
Getting ready
How to do it…
How it works…
There’s more…
Finding regions showing high expression ab initio using bumphunter
Getting ready
How to do it…
How it works…
There’s more…
Differential peak analysis
Getting ready
How to do it…
How it works…
Estimating batch effects with SVA
Getting ready
How to do it…
How it works…
Finding allele-specific expression with AllelicImbalance
Getting ready
How to do it…
How it works…
There’s more…
Presenting RNA-Seq data using ComplexHeatmap
Getting ready
How to do it…
How it works…
7
Finding Genetic Variants with HTS Data
Technical requirements
Further information
Finding SNPs and INDELs from sequence data using VariantTools
Getting ready
How to do it…
How it works…
There’s more…
See also
Predicting open reading frames in long reference sequences
Getting ready
How to do it…
How it works…
There’s more…
Plotting features on genetic maps with karyoploteR
Getting ready
How to do it…
How it works…
There’s more…
See also
Selecting and classifying variants with VariantAnnotation
Getting ready
How to do it…
How it works…
See also
Extracting information in genomic regions of interest
Getting ready
How to do it…
How it works…
There’s more…
Finding phenotype and genotype associations with GWAS
Getting ready
How to do it…
How it works…
Estimating the copy number at a locus of interest
Getting ready
How to do it…
How it works…
See also
8
Searching Gene and Protein Sequences for Domains and Motifs
Technical requirements
Further information
Finding DNA motifs with universalmotif
Getting ready
How to do it…
How it works…
There’s more…
Finding protein domains with PFAM and bio3d
Getting ready
How to do it…
How it works…
There’s more…
Finding InterPro domains
Getting ready
How to do it…
How it works…
There’s more…
See also…
Finding transmembrane domains with tmhmm and pureseqTM
Getting ready
How to do it…
How it works…
There’s more…
See also
Creating figures of protein domains using drawProteins
Getting ready
How to do it…
How it works…
There’s more…
Performing multiple alignments of proteins or genes
Getting ready
How to do it…
How it works…
There’s more…
Aligning genomic length sequences with DECIPHER
Getting ready
How to do it…
How it works…
Novel feature detection in proteins
Getting ready
How to do it…
How it works…
3D structure protein alignment in bio3d
Getting ready
How to do it…
How it works…
There’s more…
9
Phylogenetic Analysis and Visualization
Technical requirements
Further information
Reading and writing varied tree formats with ape and treeio
Getting ready
How to do it…
How it works…
See also
Visualizing trees of many genes quickly with ggtree
Getting ready
How to do it…
How it works…
There’s more…
Quantifying and estimating the differences between trees with treespace
Getting ready
How to do it…
How it works…
There’s more…
Extracting and working with subtrees using ape
Getting ready
How to do it…
How it works…
There’s more…
Creating dot plots for alignment visualizations
Getting ready
How to do it…
How it works…
Reconstructing trees from alignments using phangorn
Getting ready
How to do it…
How it works…
Finding orthologue candidates using reciprocal BLASTs
Getting ready
How to do it…
How it works…
See also
10
Analyzing Gene Annotations
Technical requirements
Further information
Retrieving gene and genome annotations from BioMart
Getting ready
How to do it…
How it works…
Getting Gene Ontology information for functional analysis from appropriate databases
Getting ready
How to do it…
How it works…
Using AnnoDB packages for genome annotation
Getting ready
How to do it…
How it works…
See also
Using ClusterProfiler for determining GO enrichment in clusters
Getting ready
How to do it…
How it works…
Finding GO enrichment in an Ontology Conditional way with topGO
Getting ready
How to do it…
How it works…
Finding enriched KEGG pathways
Getting ready
How to do it…
How it works…
Retrieving and working with SNPs
Getting ready
How to do it…
How it works…
There’s more…
11
Machine Learning with mlr3
Technical requirements
Further information
Defining a task and learner to implement k-nearest neighbors (k-NNs) in mlr3
Getting ready
How to do it…
How it works…
There’s more…
See also...
Testing the fit of the model using cross-validation
Getting ready
How to do it…
How it works…
There’s more…
Using logistic regression to classify the relative likelihood of two outcomes
Getting ready
How to do it…
How it works…
See also
Classifying using random forest and interpreting it with iml
Getting ready
How to do it…
How it works…
Dimension reduction with PCA in mlr3 pipelines
Getting ready
How to do it…
How it works…
There’s more…
Creating a tSNE and UMAP embedding
Getting ready
How to do it…
How it works…
Clustering with k-means and hierarchical clustering
Getting ready
How to do it…
How it works…
12
Functional Programming with purrr and base R
Technical requirements
Further information
Making base R objects “tidy”
Getting ready
How to do it…
How it works…
Using nested dataframes for functional programming
Getting ready
How to do it…
How it works…
See also
Using the apply family of functions
Getting ready
How to do it…
How it works…
Using the map family of functions in purrr
Getting ready
How to do it…
How it works…
Working with lists in purrr
Getting ready
How to do it…
How it works…
13
Turbo-Charging Development in R with ChatGPT
Technical requirements
Further information
Interpreting complicated code with ChatGPT assistance
Getting ready
How to do it…
How it works…
There’s more…
Debugging and improving code with ChatGPT
Getting ready
How to do it…
How it works…
Generating code with ChatGPT
Getting ready
How to do it…
How it works…
There’s more…
Writing documentation for R functions with ChatGPT
Getting ready
How to do it…
How it works…
There’s more…
Writing unit tests for R functions with ChatGPT
Getting ready
How to do it…
How it works…
Finding R packages to build a workflow with ChatGPT
Getting ready
How to do it…
How it works…
Index
Why subscribe?
Other Books You May Enjoy
Packt is searching for authors like you
Share Your Thoughts
Download a free PDF copy of this book