Learn the fundamentals of statistics and machine learning using R libraries for data processing, visualization, model training, and statistical inference
Key Features
- Advance your ML career with the help of detailed explanations, intuitive illustrations, and code examples
- Gain practical insights into the real-world applications of statistics and machine learning
- Explore the technicalities of statistics and machine learning for effective data presentation
- Purchase of the print or Kindle book includes a free PDF eBook
Book Description
The Statistics and Machine Learning with R Workshop is a comprehensive resource packed with insights into statistics and machine learning, along with a deep dive into R libraries. The learning experience is further enhanced by practical examples and hands-on exercises that provide explanations of key concepts.
Starting with the fundamentals, you’ll explore the complete model development process, covering everything from data pre-processing to model development. In addition to machine learning, you’ll also delve into R's statistical capabilities, learning to manipulate various data types and tackle complex mathematical challenges from algebra and calculus to probability and Bayesian statistics. You’ll discover linear regression techniques and more advanced statistical methodologies to hone your skills and advance your career.
By the end of this book, you'll have a robust foundational understanding of statistics and machine learning. You’ll also be proficient in using R's extensive libraries for tasks such as data processing and model training and be well-equipped to leverage the full potential of R in your future projects.
What you will learn
- Hone your skills in different probability distributions and hypothesis testing
- Explore the fundamentals of linear algebra and calculus
- Master crucial statistics and machine learning concepts in theory and practice
- Discover essential data processing and visualization techniques
- Engage in interactive data analysis using R
- Use R to perform statistical modeling, including Bayesian and linear regression
Who this book is for
This book is for beginner to intermediate-level data scientists, undergraduate to masters-level students, and early to mid-senior data scientists or analysts looking to expand their knowledge of machine learning by exploring various R libraries. Basic knowledge of linear algebra and data modeling is a must.
Author(s): Liu Peng
Edition: 1
Publisher: Packt Publishing
Year: 2023
Language: English
Pages: 516
The Statistics and Machine Learning with R Workshop
Contributors
About the author
About the reviewer
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Share your thoughts
Download a free PDF copy of this book
Part 1:Statistics Essentials
Chapter 1: Getting Started with R
Technical requirements
Introducing R
Covering the R and RStudio basics
Common data types in R
Common data structures in R
Vector
Matrix
Data frame
List
Control logic in R
Relational operators
Logical operators
Conditional statements
Loops
Exploring functions in R
Summary
Chapter 2: Data Processing with dplyr
Technical requirements
Introducing tidyverse and dplyr
Data transformation with dplyr
Slicing the dataset using the filter() function
Sorting the dataset using the arrange() function
Adding or changing a column using the mutate() function
Selecting columns using the select() function
Selecting the top rows using the top_n() function
Combining the five verbs
Introducing other verbs
Data aggregation with dplyr
Counting observations using the count() function
Aggregating data via group_by() and summarize()
Data merging with dplyr
Case study – working with the Stack Overflow dataset
Summary
Chapter 3: Intermediate Data Processing
Technical requirements
Transforming categorical and numeric variables
Recoding categorical variables
Creating variables using case_when()
Binning numeric variables using cut()
Reshaping the DataFrame
Converting from long format into wide format using spread()
Converting from wide format into long format using gather()
Manipulating string data
Creating strings
Converting numbers into strings
Connecting strings
Working with stringr
Basics of stringr
Pattern matching in a string
Splitting a string
Replacing a string
Putting it together
Introducing regular expressions
Working with tidy text mining
Converting text into tidy data using unnest_tokens()
Working with a document-term matrix
Summary
Chapter 4: Data Visualization with ggplot2
Technical requirements
Introducing ggplot2
Building a scatter plot
Understanding the grammar of graphics
Geometries in graphics
Understanding geometry in scatter plots
Introducing bar charts
Introducing line plots
Controlling themes in graphics
Adjusting themes
Exploring ggthemes
Summary
Chapter 5: Exploratory Data Analysis
Technical requirements
EDA fundamentals
Analyzing categorical data
Summarizing categorical variables using counts
Converting counts into proportions
Marginal distribution and faceted bar charts
Analyzing numerical data
Visualization in higher dimensions
Measuring the central concentration
Measuring variability
Working with skewed distributions
EDA in practice
Obtaining the stock price data
Univariate analysis of individual stock prices
Correlation analysis
Summary
Chapter 6: Effective Reporting with R Markdown
Technical requirements
Fundamentals of R Markdown
Getting started with R Markdown
Getting to know the YAML header
Formatting textual information
Writing R code
Generating a financial analysis report
Getting and displaying the data
Performing data analysis
Adding plots to the report
Adding tables to the report
Configuring code chunks
Customizing R Markdown reports
Adding a table of contents
Creating a report with parameters
Customizing the report style
Summary
Part 2:Fundamentals of Linear Algebra and Calculus in R
Chapter 7: Linear Algebra in R
Technical requirements
Introducing linear algebra
Working with vectors
Working with matrices
Matrix vector multiplication
Matrix multiplication
The identity matrix
Transposing a matrix
Inverting a matrix
Solving a system of linear equations
System of linear equations
The solution to matrix-vector equations
Geometric interpretation of solving a system of linear equations
Obtaining a unique solution to a system of linear equations
Overdetermined and underdetermined systems of linear equations
Summary
Chapter 8: Intermediate Linear Algebra in R
Technical requirements
Introducing the matrix determinant
Interpreting the determinant
Connection to the matrix rank
Introducing the matrix trace
Special properties of the matrix trace
Understanding the matrix norm
Understanding the vector norm
Calculating the L 1-norm of a vector
Calculating the L 2-norm of a vector
Calculating the L ∞-norm of a vector
Understanding the matrix norm
Calculating the L 1-norm of a matrix
Calculating the Frobenius norm of a matrix
Calculating the infinity norm of a matrix
Getting to know eigenvalues and eigenvectors
Understanding scalar-vector multiplication
Defining eigenvalues and eigenvectors
Computing eigenvalues and eigenvectors
Introducing principal component analysis
Understanding the variance-covariance matrix
Connecting to PCA
Performing PCA
Summary
Chapter 9: Calculus in R
Technical requirements
Introducing calculus
Differential and integral calculus
More on functions
Vertical line test
Functional symmetry
Increasing and decreasing functions
Slope of a function
Function composition
Common functions
Understanding limits
Infinite limit
Limit at infinity
Introducing derivatives
Common derivatives
Common properties and rules of derivatives
Introducing integral calculus
Indefinite integrals
Indefinite integrals of basic functions
Properties of indefinite integrals
Integration by parts
Definite integrals
Working with calculus in R
Plotting basic functions
Working with derivatives
Using symbolic parameters
Working with the second derivative
Working with partial derivatives
Working with integration in R
More on antiderivatives
Evaluating the definite integral
Summary
Part 3:Fundamentals of Mathematical Statistics in R
Chapter 10: Probability Basics
Technical requirements
Introducing probability distribution
Exploring common discrete probability distributions
The Bernoulli distribution
The binomial distribution
The Poisson distribution
Poisson approximation to binomial distribution
The geometric distribution
Comparing different discrete probability distributions
Discovering common continuous probability distributions
The normal distribution
The exponential distribution
Uniform distribution
Generating normally distributed random samples
Understanding common sampling distributions
Common sampling distributions
Understanding order statistics
Extracting order statistics
Calculating the value at risk
Summary
Chapter 11: Statistical Estimation
Statistical inference for categorical data
Statistical inference for a single parameter
Introducing the General Social Survey dataset
Calculating the sample proportion
Calculating the confidence interval
Interpreting the confidence interval of the sample proportion
Hypothesis testing for the sample proportion
Inference for the difference in sample proportions
Type I and Type II errors
Testing the independence of two categorical variables
Introducing the contingency table
Applying the chi-square test for independence between two categorical variables
Statistical inference for numerical data
Generating a bootstrap distribution for the median
Constructing the bootstrapped confidence interval
Re-centering a bootstrap distribution
Introducing the central limit theorem used in t-distribution
Constructing the confidence interval for the population mean using the t-distribution
Performing hypothesis testing for two means
Introducing ANOVA
Summary
Chapter 12: Linear Regression in R
Introducing linear regression
Understanding simple linear regression
Introducing multiple linear regression
Seeking a higher coefficient of determination
More on adjusted R 2
Developing an MLR model
Introducing Simpson’s Paradox
Working with categorical variables
Introducing the interaction term
Handling nonlinear terms
More on the logarithmic transformation
Working with the closed-form solution
Dealing with multicollinearity
Dealing with heteroskedasticity
Introducing penalized linear regression
Working with ridge regression
Working with lasso regression
Summary
Chapter 13: Logistic Regression in R
Technical requirements
Introducing logistic regression
Understanding the sigmoid function
Grokking the logistic regression model
Comparing logistic regression with linear regression
Making predictions using the logistic regression model
More on log odds and odds ratio
Introducing the cross-entropy loss
Evaluating a logistic regression model
Dealing with an imbalanced dataset
Penalized logistic regression
Extending to multi-class classification
Summary
Chapter 14: Bayesian Statistics
Technical requirements
Introducing Bayesian statistics
A first look into the Bayesian theorem
Understanding the generative model
Understanding prior distributions
Introducing the likelihood function
Introducing the posterior model
Diving deeper into Bayesian inference
Introducing the normal-normal model
Introducing MCMC
The full Bayesian inference procedure
Bayesian linear regression with a categorical variable
Summary
Index
Why subscribe?
Other Books You May Enjoy
Packt is searching for authors like you
Share your thoughts
Download a free PDF copy of this book