Make data-driven, informed decisions and enhance your statistical expertise in Python by turning raw data into meaningful insights
Key Features
Gain expertise in identifying and modeling patterns that generate success
Explore the concepts with Python using important libraries such as stats models
Learn how to build models on real-world data sets and find solutions to practical challenges
Book Description
The ability to proficiently perform statistical modeling is a fundamental skill for data scientists and essential for businesses reliant on data insights. Building Statistical Models with Python is a comprehensive guide that will empower you to leverage mathematical and statistical principles in data assessment, understanding, and inference generation.
This book not only equips you with skills to navigate the complexities of statistical modeling, but also provides practical guidance for immediate implementation through illustrative examples. Through emphasis on application and code examples, you’ll understand the concepts while gaining hands-on experience. With the help of Python and its essential libraries, you’ll explore key statistical models, including hypothesis testing, regression, time series analysis, classification, and more.
By the end of this book, you’ll gain fluency in statistical modeling while harnessing the full potential of Python's rich ecosystem for data analysis.
What you will learn
Explore the use of statistics to make decisions under uncertainty
Answer questions about data using hypothesis tests
Understand the difference between regression and classification models
Build models with stats models in Python
Analyze time series data and provide forecasts
Discover Survival Analysis and the problems it can solve
Who this book is for
If you are looking to get started with building statistical models for your data sets, this book is for you! Building Statistical Models in Python bridges the gap between statistical theory and practical application of Python. Since you’ll take a comprehensive journey through theory and application, no previous knowledge of statistics is required, but some experience with Python will be useful.
Author(s): Huy Hoang Nguyen, Paul N Adams, Stuart J Miller
Edition: 1
Publisher: Packt Publishing Pvt Ltd
Year: 2023
Language: English
Pages: 702
Building Statistical Models in Python
Contributors
About the authors
About the reviewers
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Share Your Thoughts
Download a free PDF copy of this book
Part 1:Introduction to Statistics
1
Sampling and Generalization
Software and environment setup
Population versus sample
Population inference from samples
Randomized experiments
Observational study
Sampling strategies – random, systematic, stratified, and clustering
Probability sampling
Non-probability sampling
Summary
2
Distributions of Data
Technical requirements
Understanding data types
Nominal data
Ordinal data
Interval data
Ratio data
Visualizing data types
Measuring and describing distributions
Measuring central tendency
Measuring variability
Measuring shape
The normal distribution and central limit theorem
The Central Limit Theorem
Bootstrapping
Confidence intervals
Standard error
Correlation coefficients (Pearson’s correlation)
Permutations
Permutations and combinations
Permutation testing
Transformations
Summary
References
3
Hypothesis Testing
The goal of hypothesis testing
Overview of a hypothesis test for the mean
Scope of inference
Hypothesis test steps
Type I and Type II errors
Type I errors
Type II errors
Basics of the z-test – the z-score, z-statistic, critical values, and p-values
The z-score and z-statistic
A z-test for means
z-test for proportions
Power analysis for a two-population pooled z-test
Summary
4
Parametric Tests
Assumptions of parametric tests
Normally distributed population data
Equal population variance
T-test – a parametric hypothesis test
T-test for means
Two-sample t-test – pooled t-test
Two-sample t-test – Welch’s t-test
Paired t-test
Tests with more than two groups and ANOVA
Multiple tests for significance
ANOVA
Pearson’s correlation coefficient
Power analysis examples
Summary
References
5
Non-Parametric Tests
When parametric test assumptions are violated
Permutation tests
The Rank-Sum test
The test statistic procedure
Normal approximation
Rank-Sum example
The Signed-Rank test
The Kruskal-Wallis test
Chi-square distribution
Chi-square goodness-of-fit
Chi-square test of independence
Chi-square goodness-of-fit test power analysis
Spearman’s rank correlation coefficient
Summary
Part 2:Regression Models
6
Simple Linear Regression
Simple linear regression using OLS
Coefficients of correlation and determination
Coefficients of correlation
Coefficients of determination
Required model assumptions
A linear relationship between the variables
Normality of the residuals
Homoscedasticity of the residuals
Sample independence
Testing for significance and validating models
Model validation
Summary
7
Multiple Linear Regression
Multiple linear regression
Adding categorical variables
Evaluating model fit
Interpreting the results
Feature selection
Statistical methods for feature selection
Performance-based methods for feature selection
Recursive feature elimination
Shrinkage methods
Ridge regression
LASSO regression
Elastic Net
Dimension reduction
PCA – a hands-on introduction
PCR – a hands-on salary prediction study
Summary
Part 3:Classification Models
8
Discrete Models
Probit and logit models
Multinomial logit model
Poisson model
The Poisson distribution
Modeling count data
The negative binomial regression model
Negative binomial distribution
Summary
9
Discriminant Analysis
Bayes’ theorem
Probability
Conditional probability
Discussing Bayes’ Theorem
Linear Discriminant Analysis
Supervised dimension reduction
Quadratic Discriminant Analysis
Summary
Part 4:Time Series Models
10
Introduction to Time Series
What is a time series?
Goals of time series analysis
Statistical measurements
Mean
Variance
Autocorrelation
Cross-correlation
The white-noise model
Stationarity
Summary
References
11
ARIMA Models
Technical requirements
Models for stationary time series
Autoregressive (AR) models
Moving average (MA) models
Autoregressive moving average (ARMA) models
Models for non-stationary time series
ARIMA models
Seasonal ARIMA models
More on model evaluation
Summary
References
12
Multivariate Time Series
Multivariate time series
Time-series cross-correlation
ARIMAX
Preprocessing the exogenous variables
Fitting the model
Assessing model performance
VAR modeling
Step 1 – visual inspection
Step 2 – selecting the order of AR(p)
Step 3 – assessing cross-correlation
Step 4 – building the VAR(p,q) model
Step 5 – testing the forecast
Step 6 – building the forecast
Summary
References
Part 5:Survival Analysis
13
Time-to-Event Variables – An Introduction
What is censoring?
Left censoring
Right censoring
Interval censoring
Type I and Type II censoring
Survival data
Survival Function, Hazard and Hazard Ratio
Summary
14
Survival Models
Technical requirements
Kaplan-Meier model
Model definition
Model example
Exponential model
Model example
Cox Proportional Hazards regression model
Step 1
Step 2
Step 3
Step 4
Step 5
Summary
Index
Why subscribe?
Other Books You May Enjoy
Packt is searching for authors like you
Share Your Thoughts