A deep dive into the programming language of choice for statistics and data
With R All-in-One For Dummies, you get five mini-books in one, offering a complete and thorough resource on the R programming language and a road map for making sense of the sea of data we’re all swimming in. Maybe you’re pursuing a career in data science, maybe you’re looking to infuse a little statistics know-how into your existing career, or maybe you’re just R-curious. This book has your back. Along with providing an overview of coding in R and how to work with the language, this book delves into the types of projects and applications R programmers tend to tackle the most. You’ll find coverage of statistical analysis, machine learning, and data management with R.
• Grasp the basics of the R programming language and write your first lines of code
• Understand how R programmers use code to analyze data and perform statistical analysis
• Use R to create data visualizations and machine learning programs
• Work through sample projects to hone your R coding skill
This is an excellent all-in-one resource for beginning coders who'd like to move into the data space by knowing more about R.
Author(s): Joseph Schmuller
Series: For Dummies
Edition: 1
Publisher: Wiley
Year: 2023
Language: English
Commentary: Publisher's PDF
Pages: 688
City: Hoboken, NJ
Tags: Machine Learning; Data Analysis; Neural Networks; Regression; Clustering; R; Statistics; For Dummies; Hypothesis Testing; Statistical Inference; Analysis of Variance; Elementary
Title Page
Copyright Page
Table of Contents
Introduction
About This All-in-One
Book 1: Introducing R
Book 2: Describing Data
Book 3: Analyzing Data
Book 4: Learning from Data
Book 5: Harnessing R: Some Projects to Keep You Busy
What You Can Safely Skip
Foolish Assumptions
Icons Used in This Book
Beyond This Book
Where to Go from Here
1 Introducing R
Chapter 1 R: What It Does and How It Does It
The Statistical (and Related) Ideas You Just Have to Know
Samples and populations
Variables: Dependent and independent
Types of data
A little probability
Inferential statistics: Testing hypotheses
Null and alternative hypotheses
Two types of error
Getting R
Getting RStudio
A Session with R
The working directory
Getting started
R Functions
User-Defined Functions
Comments
R Structures
Vectors
Numerical vectors
Matrices
Lists
Data frames
for Loops and if Statements
Chapter 2 Working with Packages, Importing, and Exporting
Installing Packages
Examining Data
Heads and tails
Missing data
Subsets
R Formulas
More Packages
Exploring the tidyverse
Importing and Exporting
Spreadsheets
CSV files
Text files
2 Describing Data
Chapter 1 Getting Graphic
Finding Patterns
Graphing a distribution
Bar-hopping
Slicing the pie
The plot of scatter
Of boxes and whiskers
Doing the Basics: Base R Graphics, That Is
Histograms
Graph features
Bar plots
Pie graphs
Dot charts
Bar plots revisited
Scatter plots
A plot twist
Scatter plot matrix
Box plots
Kicking It Up a Notch to ggplot2
Histograms
Bar plots
Dot charts
Bar plots re-revisited
Scatter plots
About that plot twist . . .
Scatter plot matrix
Box plots
Putting a Bow On It
Chapter 2 Finding Your Center
Means: The Lure of Averages
Calculating the Mean
The Average in R: mean()
What’s your condition?
Eliminate $ signs forthwith()
Explore the data
Outliers: The flaw of averages
Medians: Caught in the Middle
The Median in R: median()
Statistics à la Mode
The Mode in R
Chapter 3 Deviating from the Average
Measuring Variation
Averaging squared deviations: Variance and how to calculate it
Sample variance
Variance in R
Back to the Roots: Standard Deviation
Population standard deviation
Sample standard deviation
Standard Deviation in R
Conditions, conditions, conditions . . .
Chapter 4 Meeting Standards and Standings
Catching Some Zs
Characteristics of z-scores
Bonds versus the Bambino
Exam scores
Standard Scores in R
Where Do You Stand?
Ranking in R
Tied scores
Nth smallest, Nth largest
Percentiles
Percent ranks
Summarizing
Chapter 5 Summarizing It All
How Many?
The High and the Low
Living in the Moments
A teachable moment
Back to descriptives
Skewness
Kurtosis
Tuning in the Frequency
Nominal variables: table() et al.
Numerical variables: hist()
Cumulative frequency
Step by step: The empirical cumulative distribution function
Numerical variables: stem()
Summarizing a Data Frame
Chapter 6 What’s Normal?
Hitting the Curve
Digging deeper
Parameters of a normal distribution
Working with Normal Distributions
Distributions in R
Normal density function
Plotting a normal curve
Cumulative density function
Plotting the cdf
Quantiles of normal distributions
Plotting the cdf with quartiles
Random sampling
Meeting a Distinguished Member of the Family
The standard normal distribution in R
Plotting the standard normal distribution
3 Analyzing Data
Chapter 1 The Confidence Game: Estimation
Understanding Sampling Distributions
An EXTREMELY Important Idea: The Central Limit Theorem
(Approximately) simulating the central limit theorem
Predictions of the central limit theorem
Confidence: It Has Its Limits!
Finding confidence limits for a mean
Using R to find the confidence limits for a mean
Fit to a t
Chapter 2 One-Sample Hypothesis Testing
Hypotheses, Tests, and Errors
Hypothesis Tests and Sampling Distributions
Catching Some Z’s Again
Z Testing in R
t for One
t Testing in R
Working with t-Distributions
Visualizing t-Distributions
Plotting t in base R graphics
Plotting t in ggplot2
One more thing about ggplot2
Testing a Variance
Manufacturing an Example
Testing in R
Working with Chi-Square Distributions
Visualizing Chi-Square Distributions
Plotting chi-square in base R graphics
Plotting chi-square in ggplot2
Chapter 3 Two-Sample Hypothesis Testing
Hypotheses Built for Two
Sampling Distributions Revisited
Applying the central limit theorem
Zs once more
Z-testing for two samples in R
t for Two
Like Peas in a Pod: Equal Variances
t-Testing in R
Working with two vectors
Working with a data frame and a formula
Visualizing the results
Box plots
Bar graphs
Like ps and qs: Unequal variances
A Matched Set: Hypothesis Testing for Paired Samples
Paired Sample t-testing in R
Testing Two Variances
F testing in R
F in conjunction with t
Working with F Distributions
Visualizing F Distributions
Chapter 4 Testing More than Two Samples
Testing More than Two
A thorny problem
A solution
Meaningful relationships
ANOVA in R
Plotting a boxplot to visualize the data
After the ANOVA
Planned comparisons
Another word about contrasts
Contrasts in R
Unplanned comparisons
Another Kind of Hypothesis, Another Kind of Test
Working with repeated measures ANOVA
Repeated measures ANOVA in R
Visualizing the results
Getting Trendy
Trend Analysis in R
Chapter 5 More Complicated Testing
Cracking the Combinations
Interactions
The analysis
Two-Way ANOVA in R
Visualizing the two-way results
Two Kinds of Variables . . . at Once
Mixed ANOVA in R
Visualizing the mixed ANOVA results
After the Analysis
Multivariate Analysis of Variance
MANOVA in R
Visualizing the MANOVA results
After the MANOVA
Chapter 6 Regression: Linear, Multiple, and the General Linear Model
The Plot of Scatter
Graphing Lines
Regression: What a Line!
Using regression for forecasting
Variation around the regression line
Testing hypotheses about regression
Testing the fit
Testing the slope
Testing the intercept
Linear Regression in R
Features of the linear model
Making predictions
Visualizing the scatterplot and regression line
Plotting the residuals
Juggling Many Relationships at Once: Multiple Regression
Multiple regression in R
Making predictions
Visualizing the 3d scatterplot and regression plane
The scatterplot3d package
car and rgl: A package deal
ANOVA: Another Look
Analysis of Covariance: The Final Component of the GLM
But Wait — There’s More
Chapter 7 Correlation: The Rise and Fall of Relationships
Understanding Correlation
Correlation and Regression
Testing Hypotheses about Correlation
Is a correlation coefficient greater than zero?
Do two correlation coefficients differ?
Correlation in R
Calculating a correlation coefficient
Testing a correlation coefficient
Testing the difference between two correlation coefficients
Calculating a correlation matrix
Visualizing correlation matrices
Multiple Correlation
Multiple correlation in R
Adjusting R-squared
Partial Correlation
Partial Correlation in R
Semipartial Correlation
Semipartial Correlation in R
Chapter 8 Curvilinear Regression: When Relationships Get Complicated
What Is a Logarithm?
What Is e?
Power Regression
Exponential Regression
Logarithmic Regression
Polynomial Regression: A Higher Power
Which Model Should You Use?
Chapter 9 In Due Time
A Time Series and Its Components
Forecasting: A Moving Experience
Forecasting: Another Way
Working with Real Data
Chapter 10 Non-Parametric Statistics
Independent Samples
Two samples: Wilcoxon rank-sum test
More than two samples: Kruskal-Wallis One-Way ANOVA
Matched Samples
Two samples: Wilcoxon matched-pairs signed ranks
More than two samples: Friedman two-way ANOVA
More than two samples: Cochran’s Q
Correlation: Spearman’s rS
Correlation: Kendall’s Tau
A Heads-Up
Chapter 11 Introducing Probability
What Is Probability?
Experiments, trials, events, and sample spaces
Sample spaces and probability
Compound Events
Union and intersection
Intersection, again
Conditional Probability
Working with the probabilities
The foundation of hypothesis testing
Large Sample Spaces
Permutations
Combinations
R Functions for Counting Rules
Random Variables: Discrete and Continuous
Probability Distributions and Density Functions
The Binomial Distribution
The Binomial and Negative Binomial in R
Binomial distribution
Negative binomial distribution
Hypothesis Testing with the Binomial Distribution
More on Hypothesis Testing: R versus Tradition
Chapter 12 Probability Meets Regression: Logistic Regression
Getting the Data
Doing the Analysis
Visualizing the Results
4 Learning from Data
Chapter 1 Tools and Data for Machine Learning Projects
The UCI (University of California-Irvine) ML Repository
Working with a UCI dataset
Cleaning up the data
Correcting errors
Eliminating the unnecessary
Exploring the data
Quick suggested project: Density plots
Exploring relationships in the data
Base R graphics
The ggplot version
Introducing the Rattle package
Using Rattle with iris
Getting and (further) exploring the data
Finding clusters in the data
Chapter 2 Decisions, Decisions, Decisions
Decision Tree Components
Roots and leaves
Tree construction
Decision Trees in R
Growing the tree in R
Drawing the tree in R
Decision Trees in Rattle
Creating the tree
Drawing the tree
Evaluating the tree
Project: A More Complex Decision Tree
The data: Car evaluation
Data exploration
Building and drawing the tree
Evaluating the tree
Quick suggested project: Understanding the complexity parameter
Suggested Project: Titanic
Chapter 3 Into the Forest, Randomly
Growing a Random Forest
Random Forests in R
Building the forest
Evaluating the forest
A closer look
Plotting error
Plotting importance
Project: Identifying Glass
The data
Getting the data into Rattle
Exploring the data
Growing the random forest
Visualizing the results
Suggested Project: Identifying Mushrooms
Chapter 4 Support Your Local Vector
Some Data to Work With
Using a subset
Defining a boundary
Understanding support vectors
Separability: It’s Usually Nonlinear
Support Vector Machines in R
Working with e1071
Creating the data frame
Separating into training and test sets
Training the SVM
Plotting the SVM
Testing the SVM
Quick suggested project 1: Using all the variables
Quick suggested project 2: Working with kernels
Quick suggested project 3: Classifying all the irises
Working with kernlab
Project: House Parties
Reading in the data
Exploring the data
Creating the SVM
Evaluating the SVM
Chapter 5 K-Means Clustering
How It Works
K-Means Clustering in R
Setting up and analyzing the data
Understanding the output
Visualizing the clusters
Finding the optimum number of clusters
Quick suggested project: Adding the sepals
Project: Glass Clusters
The data
Starting Rattle and exploring the data
Preparing to cluster
Doing the clustering
Going beyond Rattle
Chapter 6 Neural Networks
Networks in the Nervous System
Artificial Neural Networks
Overview
Input layer and hidden layer
Output layer
How it all works
Neural Networks in R
Building a neural network for the iris data frame
Plotting the network
Evaluating the network
Quick suggested project: Those sepals
Project: Banknotes
The data
Taking a quick look ahead
Setting up Rattle
Evaluating the network
Going beyond Rattle: Visualizing the network
Suggested Projects: Rattling Around
Chapter 7 Exploring Marketing
Analyzing Retail Data
The data
RFM in R
Preparing the data
Doing the analysis
Examining the results
Taking a look at the countries
Enter Machine Learning
Working with k-means clustering
Working with Rattle
Digging into the clusters
The clusters and the classes
Quick suggested project
Suggested Project: Another Data Set
Chapter 8 From the City That Never Sleeps
Examining the Data Set
Warming Up
Glimpsing and viewing
Piping, filtering, and grouping
Visualizing
Joining
Quick Suggested Project: Airline Names
Suggested Project: Departure Delays
Adding a variable: weekday
Quick Suggested Project: Analyze Weekday Differences
Delay, weekday, and airport
Delay and flight duration
Suggested Project: Delay and Weather
5 Harnessing R: Some Projects to Keep You Busy
Chapter 1 Working with a Browser
Getting Your Shine On
Creating Your First shiny Project
The user interface
The server
Final steps
Getting reactive
Working with ggplot
Changing the server
A few more changes
Getting reactive with ggplot
Another shiny Project
The base R version
The ggplot version
Suggested Project
Chapter 2 Dashboards — How Dashing!
The shinydashboard Package
Exploring Dashboard Layouts
Getting started with the user interface
Building the user interface: Boxes, boxes, boxes . . .
Lining up in columns
A nice trick: Keeping tabs
Suggested project: Add statistics
Suggested project: Place valueBoxes in tabPanels
Working with the Sidebar
The user interface
The server
Suggested project: Relocate the slider
Interacting with Graphics
Clicks, double-clicks, and brushes — oh, my!
Why bother with all this?
Suggested project: Experiment with airquality
Index
EULA