Learn statistics by analyzing professional basketball data! In this action-packed book, you’ll build your skills in exploratory data analysis by digging into the fascinating world of NBA games and player stats using the R language.
In Statistics Slam Dunk you’ll develop a toolbox of R data skills including:
• Reading and writing data
• Installing and loading packages
• Transforming, tidying, and wrangling data
• Applying best-in-class exploratory data analysis techniques
• Creating compelling visualizations
• Developing supervised and unsupervised machine learning algorithms
• Execute hypothesis tests, including t-tests and chi-square tests for independence
• Compute expected values, Gini coefficients, and z-scores
Statistics Slam Dunk upgrades your R data science skills by taking on practical analysis challenges based on NBA game and player data. Is losing games on purpose a rational strategy? Which hustle statistics have an impact on wins and losses? Each chapter in this one-of-a-kind guide uses new data science techniques to reveal interesting insights like these. And just like in the real world, you’ll get no clean pre-packaged datasets in Statistics Slam Dunk. You’ll take on the challenge of wrangling messy data to drill on the skills that will make you the star player on any data team.
About the technology
Amazing insights are hiding in raw data, and statistical analysis with R can help reveal them! R was built for data, and it supports modeling and statistical techniques including regression and classification models, time series forecasts, and clustering algorithms. And when you want to see your results, R’s visualizations are stunning, with best-in-class plots and charts.
About the book
Statistics Slam Dunk: Statistical analysis with R on real NBA data is an interesting and engaging how-to guide for statistical analysis using R. It’s packed with practical statistical techniques, each demonstrated using real-world data taken from NBA games. In each chapter, you’ll discover a new (and sometimes surprising!) insight into basketball, with careful step-by-step instructions on how to generate those revelations.
You’ll get practical experience cleaning, manipulating, exploring, testing, and otherwise analyzing data with base R functions and useful R packages. R’s visualization capabilities shine through in the book’s 300 visualizations, and almost 30 plots and charts including Pareto charts and Sankey diagrams. Much more than a beginner’s guide, this book explores advanced analytics techniques and data wrangling packages. You’ll find yourself returning again and again to use this book as a handy reference!
About the reader
Requires a beginning knowledge of basic statistics concepts. No advanced knowledge of statistics, machine learning, R–or basketball–required.
About the author
Gary Sutton is a vice president for a leading financial services company. He has built and led high-performing business intelligence and analytics organizations across multiple verticals, where R was the preferred programming language for predictive modeling, statistical analyses, and other quantitative insights. Gary earned his undergraduate degree from the University of Southern California, a Masters from George Washington University, and a second Masters in Data Science, from Northwestern University.
Author(s): Gary Sutton
Edition: 1
Publisher: Manning Publications
Year: 2024
Language: English
Commentary: Publisher's PDF
Pages: 672
City: Shelter Island, NY
Tags: Regression; R; Statistics; Optimization; Data Wrangling; Hypothesis Testing; Statistical Inference; Sport; ggplot2; Data Exploration; Cluster Analysis
Statistics Slam Dunk
brief contents
contents
foreword
preface
acknowledgments
about this book
Who should read this book
How this book is organized: A road map
About the code
liveBook discussion forum
about the author
about the cover illustration
Chapter 1: Getting started
1.1 Brief introductions to R and RStudio
1.2 Why R?
1.2.1 Visualizing data
1.2.2 Installing and using packages to extend R’s functional footprint
1.2.3 Networking with other users
1.2.4 Interacting with big data
1.2.5 Landing a job
1.3 How this book works
Chapter 2: Exploring data
2.1 Loading packages
2.2 Importing data
2.3 Wrangling data
2.3.1 Removing variables
2.3.2 Removing observations
2.3.3 Viewing data
2.3.4 Converting variable types
2.3.5 Creating derived variables
2.4 Variable breakdown
2.5 Exploratory data analysis
2.5.1 Computing basic statistics
2.5.2 Returning data
2.5.3 Computing and visualizing frequency distributions
2.5.4 Computing and visualizing correlations
2.5.5 Computing and visualizing means and medians
2.6 Writing data
Chapter 3: Segmentation analysis
3.1 More on tanking and the draft
3.2 Loading packages
3.3 Importing and viewing data
3.4 Creating another derived variable
3.5 Visualizing means and medians
3.5.1 Regular season games played
3.5.2 Minutes played per game
3.5.3 Career win shares
3.5.4 Win shares every 48 minutes
3.6 Preliminary conclusions
3.7 Sankey diagram
3.8 Expected value analysis
3.9 Hierarchical clustering
Chapter 4: Constrained optimization
4.1 What is constrained optimization?
4.2 Loading packages
4.3 Importing data
4.4 Knowing the data
4.5 Visualizing the data
4.5.1 Density plots
4.5.2 Boxplots
4.5.3 Correlation plot
4.5.4 Bar chart
4.6 Constrained optimization setup
4.7 Constrained optimization construction
4.8 Results
Chapter 5: Regression models
5.1 Loading packages
5.2 Importing data
5.3 Knowing the data
5.4 Identifying outliers
5.4.1 Prototype
5.4.2 Identifying other outliers
5.5 Checking for normality
5.5.1 Prototype
5.5.2 Checking other distributions for normality
5.6 Visualizing and testing correlations
5.6.1 Prototype
5.6.2 Visualizing and testing other correlations
5.7 Multiple linear regression
5.7.1 Subsetting data into train and test
5.7.2 Fitting the model
5.7.3 Returning and interpreting the results
5.7.4 Checking for multicollinearity
5.7.5 Running and interpreting model diagnostics
5.7.6 Comparing models
5.7.7 Predicting
5.8 Regression tree
Chapter 6: More wrangling and visualizing data
6.1 Loading packages
6.2 Importing data
6.3 Wrangling data
6.3.1 Subsetting data sets
6.3.2 Joining data sets
6.4 Analysis
6.4.1 First quarter
6.4.2 Second quarter
6.4.3 Third quarter
6.4.4 Fourth quarter
6.4.5 Comparing best and worst teams
6.4.6 Second-half results
Chapter 7: T-testing and effect size testing
7.1 Loading packages
7.2 Importing data
7.3 Wrangling data
7.4 Analysis on 2018–19 data
7.4.1 2018–19 regular season analysis
7.4.2 2019 postseason analysis
7.4.3 Effect size testing
7.5 Analysis on 2019–20 data
7.5.1 2019–20 regular season analysis (pre-COVID)
7.5.2 2019–20 regular season analysis (post-COVID)
7.5.3 More effect size testing
Chapter 8: Optimal stopping
8.1 Loading packages
8.2 Importing images
8.3 Importing and viewing data
8.4 Exploring and wrangling data
8.5 Analysis
8.5.1 Milwaukee Bucks
8.5.2 Atlanta Hawks
8.5.3 Charlotte Hornets
8.5.4 NBA
Chapter 9: Chi-square testing and more effect size testing
9.1 Loading packages
9.2 Importing data
9.3 Wrangling data
9.4 Computing permutations
9.5 Visualizing results
9.5.1 Creating a data source
9.5.2 Visualizing the results
9.5.3 Conclusions
9.6 Statistical test of significance
9.6.1 Creating a contingency table and a balloon plot
9.6.2 Running a chi-square test
9.6.3 Creating a mosaic plot
9.7 Effect size testing
Chapter 10: Doing more with ggplot2
10.1 Loading packages
10.2 Importing and viewing data
10.3 Salaries and salary cap analysis
10.4 Analysis
10.4.1 Plotting and computing correlations between team payrolls and regular season wins
10.4.2 Payrolls versus end-of-season results
10.4.3 Payroll comparisons
Chapter 11: K-means clustering
11.1 Loading packages
11.2 Importing data
11.3 A primer on standard deviations and z-scores
11.4 Analysis
11.4.1 Wrangling data
11.4.2 Evaluating payrolls and wins
11.5 K-means clustering
11.5.1 More data wrangling
11.5.2 K-means clustering
Chapter 12: Computing and plotting inequality
12.1 Gini coefficients and Lorenz curves
12.2 Loading packages
12.3 Importing and viewing data
12.4 Wrangling data
12.5 Gini coefficients
12.6 Lorenz curves
12.7 Salary inequality and championships
12.7.1 Wrangling data
12.7.2 T-test
12.7.3 Effect size testing
12.8 Salary inequality and wins and losses
12.8.1 T-test
12.8.2 Effect size testing
12.9 Gini coefficient bands versus winning percentage
Chapter 13: More with Gini coefficients and Lorenz curves
13.1 Loading packages
13.2 Importing and viewing data
13.3 Wrangling data
13.4 Gini coefficients
13.5 Lorenz curves
13.6 For loops
13.6.1 Simple demonstration
13.6.2 Applying what we’ve learned
13.7 User-defined functions
13.8 Win share inequality and championships
13.8.1 Wrangling data
13.8.2 T-test
13.8.3 Effect size testing
13.9 Win share inequality and wins and losses
13.9.1 T-test
13.9.2 Effect size testing
13.10 Gini coefficient bands versus winning percentage
Chapter 14: Intermediate and advanced modeling
14.1 Loading packages
14.2 Importing and wrangling data
14.2.1 Subsetting and reshaping our data
14.2.2 Extracting a substring to create a new variable
14.2.3 Joining data
14.2.4 Importing and wrangling additional data sets
14.2.5 Joining data (one more time)
14.2.6 Creating standardized variables
14.3 Exploring data
14.4 Correlations
14.4.1 Computing and plotting correlation coefficients
14.4.2 Running correlation tests
14.5 Analysis of variance models
14.5.1 Data wrangling and data visualization
14.5.2 One-way ANOVAs
14.6 Logistic regressions
14.6.1 Data wrangling
14.6.2 Model development
14.7 Paired data before and after
Chapter 15: The Lindy effect
15.1 Loading packages
15.2 Importing and viewing data
15.3 Visualizing data
15.3.1 Creating and evaluating violin plots
15.3.2 Creating paired histograms
15.3.3 Printing our plots
15.4 Pareto charts
15.4.1 ggplot2 and ggQC packages
15.4.2 qcc package
Chapter 16: Randomness versus causality
16.1 Loading packages
16.2 Importing and wrangling data
16.3 Rule of succession and the hot hand
16.4 Player-level analysis
16.4.1 Player 1 of 3: Giannis Antetokounmpo
16.4.2 Player 2 of 3: Julius Randle
16.4.3 Player 3 of 3: James Harden
16.5 League-wide analysis
Chapter 17: Collective intelligence
17.1 Loading packages
17.2 Importing data
17.3 Wrangling data
17.4 Automated exploratory data analysis
17.4.1 Baseline EDA with tableone
17.4.2 Over/under EDA with DataExplorer
17.4.3 Point spread EDA with SmartEDA
17.5 Results
17.5.1 Over/under
17.5.2 Point spreads
Chapter 18: Statistical dispersion methods
18.1 Loading a package
18.2 Importing data
18.3 Exploring and wrangling data
18.4 Measures of statistical dispersion and intra-season parity
18.4.1 Variance method
18.4.2 Standard deviation method
18.4.3 Range method
18.4.4 Mean absolute deviation method
18.4.5 Median absolute deviation method
18.5 Churn and inter-season parity
18.5.1 Data wrangling
18.5.2 Computing and visualizing churn
Chapter 19: Data standardization
19.1 Loading a package
19.2 Importing and viewing data
19.3 Wrangling data
19.3.1 Treating duplicate records
19.3.2 Final trimmings
19.4 Standardizing data
19.4.1 Z-score method
19.4.2 Standard deviation method
19.4.3 Centering method
19.4.4 Range method
Chapter 20: Finishing up
20.1 Cluster analysis
20.2 Significance testing
20.3 Effect size testing
20.4 Modeling
20.5 Operations research
20.6 Probability
20.7 Statistical dispersion
20.8 Standardization
20.9 Summary statistics and visualization
Appendix: More ggplot2 visualizations
index
Numerics
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z