Insights from Data with R: An Introduction for the Life and Environmental Sciences

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Experiments, surveys, measurements, and observations all generate data. These data can provide useful insights for solving problems, guiding decisions, and formulating strategy. Progressing from relatively unprocessed data to insight, and doing so efficiently, reliably, and confidently, does
not come easily, and yet gaining insights from data is a fundamental skill for science as well as many other fields and often overlooked in most textbooks of statistics and data analysis.

This accessible and engaging book provides readers with the knowledge, experience, and confidence to work with data and unlock essential information (insights) from data summaries and visualisations. Based on a proven and successful undergraduate course structure, it charts the journey from initial
question, through data preparation, import, cleaning, tidying, checking, double-checking, manipulation, and final visualization. These basic skills are sufficient to gain useful insights from data without the need for any statistics; there is enough to learn about even before delving into that
world!

The book focuses on gaining insights from data via visualisations and summaries. The journey from raw data to insights is clearly illustrated by means of a comprehensive Workflow Demonstration in the book featuring data collected in a real-life study and applicable to many types of question, study,
and data. Along the way, readers discover how to efficiently and intuitively use R, RStudio, and tidyverse software, learning from the detailed descriptions of each step in the instructional journey to progress from the raw data to creating elegant and informative visualisations that reveal answers
to the initial questions posed. There are an additional three demonstrations online!

Insights from Data with R is suitable for undergraduate students and their instructors in the life and environmental sciences seeking to harness the power of R, RStudio, and tidyverse software to master the valuable and prerequisite skills of working with and gaining insights from data.

Author(s): Owen L. Petchey, Andrew P. Beckerman, Natalie Cooper, Dylan Z. Childs
Publisher: Oxford University Press
Year: 2021

Language: English
Commentary: True PDF
Pages: 320

Cover
Insights from Data with R: An Introduction for the Life and Environmental Sciences
Copyright
Preface
Overview
The learning ‘curve’
Untidy and dirty data
No statistical tests or models
Exploratory data analysis
Zen and the art of ‘data science’
Open-science trends
Intended readers
How is the book organized?
Online companion material
Boxes
Some ideas for instructors using this book
Relationship with Getting Started with R (GSwR), second edition, Beckerman, Childs, and Petchey (2017)
Acknowledgements
Contents
Chapter 1: Introduction
1.1 What are insights?
1.1.1 Dictionary
1.1.2 The business perspective
1.1.3 Our definition
1.1.4 Our ecology example . . . we love fruit
1.2 Question, question, question (how are data born?)
1.3 But what exactly are data?
1.4 Response and predictor variables
1.5 Some key features of datasets
1.6 Demonstrations of getting insights from data
1.7 The general Insights workflow
1.8 Summing up and looking forward
Chapter 2: Getting acquainted
2.1 Getting acquainted with R and RStudio
2.1.1 Why r?
2.1.2 Why rstudio?
2.1.3 Getting and installing r
2.1.4 Getting and installing rstudio
2.1.5 A brief tour of rstudio
2.2 Your first R command!
2.2.1 Getting to know r a little better
2.2.2 Storing and reusing results
2.2.3 What names should i use?
2.3 Writing scripts
2.3.1 Comments in your scripts
2.3.2 Save and keep safe your script file
2.3.3 Running your scripts
2.4 When things go wrong…
2.4.1 Errors
2.4.2 Warnings
2.4.3 The dreaded +
2.5 Functions
2.5.1 Functions, the sequel
2.6 Add-on packages
2.6.1 Finding add-on packages
2.6.2 Installing (downloading) packages
2.6.3 Loading packages
2.6.4 An analogy
2.6.5 Updating r, rstudio, and your packages
2.7 Getting help
2.7.1 R help system and files
2.7.2 Navigating help files
2.7.3 Vignettes
2.7.4 Cheat sheets
2.7.5 Other sources of help
2.7.6 Asking for help from others
2.8 Common pitfalls
2.9 Summing up and looking forward
Chapter 3: Workflow Demonstration part 1: Preparation
3.1 What is the question?
3.1.1 The three response variables
3.1.2 The hypotheses
3.2 Design of the study
3.3 Preparing your data
3.3.1 Acquire the dataset
3.4 Preparing your computer
3.4.1 Making the project folder for the bat data
3.4.2 Projects in rstudio
3.4.3 create a new r script and load packages
3.5 Get the data into R
3.5.1 View and refine the import
3.6 Getting going with data management
3.6.1 How the data are stored in r
3.7 Clean and tidy the data
3.7.1 Tidying the data
3.7.2 Cleaning the data
3.7.3 Refine the variable names
3.7.4 Fix the dates
3.7.5 Rename some values in a variable
3.7.6 Check for duplicates
3.7.7 Check for implausible and invalid values
3.7.8 What about those nas?
3.8 Stop that! Don’t even think about it!
3.8.1 Don’t mess with the ‘working directory’
3.8.2 Don’t use the data import tool or
3.8.3 Don’t even think about using the attach function
3.8.4 Avoid using square brackets or dollar signs
3.9 Summing up and looking forward
Chapter 4: Workflow Demonstration part 2: Getting insights
4.1 Initial insights 1: Numbers and counting
4.1.1 Our first insights: the number, sex, and age of bats
4.2 Initial insights 2: Distributions
4.2.1 Insights . . . . you’ve done it!
4.3 Transform the data
4.4 Insights about our questions
4.4.1 Distribution of number of prey
4.4.2 Shapes: mean wingspan
4.4.3 Shapes: proportion migratory
4.4.4 relationships
Dietary sex differences
Age–sex interactions
4.4.5 Communication (beautifying the graphs)
4.4.6 Beautifying the wingspan, age and sex graph
4.5 Another view of the question and data
4.5.1 Before you continue…
4.5.2 A prey-centric view
Transform the data
Visualizing the proportions
Odds and odds ratios
4.6 A caveat
4.7 Summing up and looking forward
4.8 A small reward, if you like dogs
Chapter 5: Dealing with data 1: Digging into dplyr
5.1 Introducing dplyr
5.1.1 Selecting variables with the select function
5.1.2 Renaming variables with select and rename
5.1.3 Creating new variables with the mutatefunction
5.1.4 Getting particular observations with filter
5.1.5 Ordering observations with arrange
5.2 Grouping and summarizing data with dplyr
5.2.1 Summarizing data—the nitty-gritty
5.2.2 Grouped summaries using group_by magic
5.2.3 More than one grouping variable
5.2.4 Using group_by with other verbs
5.2.5 Removing grouping information
5.3 Summing up and looking forward
Chapter 6: Dealing with data 2: Expanding your toolkit
6.1 Pipes and pipelines
6.1.1 Why do we need pipes?
6.1.2 On why you shouldn’t nest functions
6.2 Subduing the pesky string
6.3 Elegantly managing dates and times
6.3.1 Date/time formats
6.3.2 Dtes in the bat project data
6.3.3 Why parse dates?
6.3.4 More about parsing dates/times
6.3.5 Calculations with dates/times
6.4 Changing between wider and longer data arrangements
6.4.1 Going longer
6.4.2 Going wider
6.5 Summing up and looking forward
Chapter 7: Getting to grips with ggplot2
7.1 Anatomy of a ggplot
7.1.1 Layers
7.1.2 Scales
7.1.3 Coordinate system
7.1.4 Fantastic faceting
7.2 Putting it into practice
7.2.1 Inheriting data and aesthetics from ggplot
7.3 Beautifying plots
7.3.1 Working with layer-specific geom properties
7.3.2 Adding titles and labels
7.3.3 Themes
7.4 Summing up and looking forward
Chapter 8: Making deeper insights part 1: Working with single variables
8.1 Variables and data
8.1.1 Numeric versus categorical variables
8.1.2 Ratio versus interval scales
8.2 Samples and distributions
8.2.1 Understanding numerical variables
8.3 Graphical summaries of numeric variables
8.3.1 Making some insights about wingspan
8.3.2 Descriptive statistics for numeric variables
8.3.3 Measuring central tendency
8.3.4 Measuring dispersion
8.3.5 Mapping measures of central tendency and dispersion to a figure
8.3.6 Combining histograms and boxplots
8.4 A moment with missing values in numeric variables (NAs)
8.5 Exploring a categorical variable
8.5.1 Understanding categorical variables
Numerical summaries
Graphical summaries of categorical variables
8.6 Summing up and looking forward
8.7 A cat-related reward
Chapter 9: Making deeper insights part 2: Relationships among (many) variables
9.1 Associations between two numeric variables
9.1.1 Descriptive statistics: correlations
9.1.2 Other measures of correlation
9.1.3 Graphical summaries between two numericvariables: the scatterplot
9.2 Associations between two categorical variables
9.2.1 Numerical summaries
9.2.2 Graphical summaries
9.2.3 An alternative, and perhaps more valuable
9.3 Categorical–numerical associations
9.3.1 Numerical summaries
9.3.2 Graphical summaries for numerical versus categorical data
9.3.3 Alternatives to box-and-whisker plots
9.4 Building in complexity: Relationships among three or morevariables
9.5 Summing up and looking forward
Chapter 10: Looking back and looking forward
10.1 Next learning steps
10.2 Reproducibility: What, why, and how?
10.2.1 Why should you try and make your work reproducible?
10.2.2 How can you make your work more reproducible?
10.3 Congratulations!
Index