This innovative textbook presents material for a course on modern statistics that incorporates Python as a pedagogical and practical resource. Drawing on many years of teaching and conducting research in various applied and industrial settings, the authors have carefully tailored the text to provide an ideal balance of theory and practical applications. Numerous examples and case studies are incorporated throughout, and comprehensive Python applications are illustrated in detail. A custom Python package is available for download, allowing students to reproduce these examples and explore others.
The first chapters of the text focus on analyzing variability, probability models, and distribution functions. Next, the authors introduce statistical inference and bootstrapping, and variability in several dimensions and regression models. The text then goes on to cover sampling for estimation of finite population quantities and time series analysis and prediction, concluding with two chapters on modern data analytic methods. Each chapter includes exercises, data sets, and applications to supplement learning.
Modern Statistics: A Computer-Based Approach with Python is intended for a one- or two-semester advanced undergraduate or graduate course. Because of the foundational nature of the text, it can be combined with any program requiring data analysis in its curriculum, such as courses on data science, industrial statistics, physical and social sciences, and engineering. Researchers, practitioners, and data scientists will also find it to be a useful resource with the numerous applications and case studies that are included.
A second, closely related textbook is titled Industrial Statistics: A Computer-Based Approach with Python. It covers topics such as statistical process control, including multivariate methods, the design of experiments, including computer experiments and reliability methods, including Bayesian reliability. These texts can be used independently or for consecutive courses.
The mistat Python package can be accessed at https://gedeck.github.io/mistat-code-solutions/ModernStatistics/
"In this book on Modern Statistics, the last two chapters on modern analytic methods contain what is very popular at the moment, especially in Machine Learning, such as classifiers, clustering methods and text analytics. But I also appreciate the previous chapters since I believe that people using machine learning methods should be aware that they rely heavily on statistical ones. I very much appreciate the many worked out cases, based on the longstanding experience of the authors. They are very useful to better understand, and then apply, the methods presented in the book. The use of Python corresponds to the best programming experience nowadays. For all these reasons, I think the book has also a brilliant and impactful future and I commend the authors for that."
Professor Fabrizio RuggeriResearch Director at the National Research Council, ItalyPresident of the International Society for Business and Industrial Statistics (ISBIS)Editor-in-Chief of Applied Stochastic Models in Business and Industry (ASMBI)
Author(s): Ron S. Kenett, Shelemyahu Zacks, Peter Gedeck
Series: Statistics for Industry, Technology, and Engineering
Publisher: Birkhäuser
Year: 2022
Language: English
Pages: 452
City: Cham
1
Preface
Contents
Industrial Statistics: A Computer-Based Approach with Python (Companion volume)
List of Abbreviations
978-3-031-07566-7_1
1 Analyzing Variability: Descriptive Statistics
1.1 Random Phenomena and the Structure of Observations
1.2 Accuracy and Precision of Measurements
1.3 The Population and the Sample
1.4 Descriptive Analysis of Sample Values
1.4.1 Frequency Distributions of Discrete Random Variables
1.4.2 Frequency Distributions of Continuous RandomVariables
1.4.3 Statistics of the Ordered Sample
1.4.4 Statistics of Location and Dispersion
1.5 Prediction Intervals
1.6 Additional Techniques of Exploratory Data Analysis
1.6.1 Density Plots
1.6.2 Box and Whiskers Plots
1.6.3 Quantile Plots
1.6.4 Stem-and-Leaf Diagrams
1.6.5 Robust Statistics for Location and Dispersion
1.7 Chapter Highlights
1.8 Exercises
978-3-031-07566-7_2
2 Probability Models and Distribution Functions
2.1 Basic Probability
2.1.1 Events and Sample Spaces: Formal Presentation of Random Measurements
2.1.2 Basic Rules of Operations with Events: Unions and Intersections
2.1.3 Probabilities of Events
2.1.4 Probability Functions for Random Sampling
2.1.5 Conditional Probabilities and Independence of Events
2.1.6 Bayes' Theorem and Its Application
2.2 Random Variables and Their Distributions
2.2.1 Discrete and Continuous Distributions
2.2.1.1 Discrete Random Variables
2.2.1.2 Continuous Random Variables
2.2.2 Expected Values and Moments of Distributions
2.2.3 The Standard Deviation, Quantiles, Measures of Skewness, and Kurtosis
2.2.4 Moment Generating Functions
2.3 Families of Discrete Distribution
2.3.1 The Binomial Distribution
2.3.2 The Hypergeometric Distribution
2.3.3 The Poisson Distribution
2.3.4 The Geometric and Negative Binomial Distributions
2.4 Continuous Distributions
2.4.1 The Uniform Distribution on the Interval (a,b), a
2.4.2 The Normal and Log-Normal Distributions
2.4.2.1 The Normal Distribution
2.4.2.2 The Log-Normal Distribution
2.4.3 The Exponential Distribution
2.4.4 The Gamma and Weibull Distributions
2.4.5 The Beta Distributions
2.5 Joint, Marginal, and Conditional Distributions
2.5.1 Joint and Marginal Distributions
2.5.2 Covariance and Correlation
2.5.3 Conditional Distributions
2.6 Some Multivariate Distributions
2.6.1 The Multinomial Distribution
2.6.2 The Multi-Hypergeometric Distribution
2.6.3 The Bivariate Normal Distribution
2.7 Distribution of Order Statistics
2.8 Linear Combinations of Random Variables
2.9 Large Sample Approximations
2.9.1 The Law of Large Numbers
2.9.2 The Central Limit Theorem
2.9.3 Some Normal Approximations
2.10 Additional Distributions of Statistics of Normal Samples
2.10.1 Distribution of the Sample Variance
2.10.2 The ``Student'' t-Statistic
2.10.3 Distribution of the Variance Ratio
2.11 Chapter Highlights
2.12 Exercises
978-3-031-07566-7_3
3 Statistical Inference and Bootstrapping
3.1 Sampling Characteristics of Estimators
3.2 Some Methods of Point Estimation
3.2.1 Moment Equation Estimators
3.2.2 The Method of Least Squares
3.2.3 Maximum Likelihood Estimators
3.3 Comparison of Sample Estimates
3.3.1 Basic Concepts
3.3.2 Some Common One-Sample Tests of Hypotheses
3.3.2.1 The Z-Test: Testing the Mean of a Normal Distribution, σ2 Known
3.3.2.2 The t-Test: Testing the Mean of a Normal Distribution, σ2 Unknown
3.3.2.3 The Chi-Squared Test: Testing the Variance of a Normal Distribution
3.3.2.4 Testing Hypotheses About the Success Probability, p, in Binomial Trials
3.4 Confidence Intervals
3.4.1 Confidence Intervals for μ; σ Known
3.4.2 Confidence Intervals for μ; σ Unknown
3.4.3 Confidence Intervals for σ2
3.4.4 Confidence Intervals for p
3.5 Tolerance Intervals
3.5.1 Tolerance Intervals for the Normal Distributions
3.6 Testing for Normality with Probability Plots
3.7 Tests of Goodness of Fit
3.7.1 The Chi-Square Test (Large Samples)
3.7.2 The Kolmogorov-Smirnov Test
3.8 Bayesian Decision Procedures
3.8.1 Prior and Posterior Distributions
3.8.2 Bayesian Testing and Estimation
3.8.2.1 Bayesian Testing
3.8.2.2 Bayesian Estimation
3.8.3 Credibility Intervals for Real Parameters
3.9 Random Sampling from Reference Distributions
3.10 Bootstrap Sampling
3.10.1 The Bootstrap Method
3.10.2 Examining the Bootstrap Method
3.10.3 Harnessing the Bootstrap Method
3.11 Bootstrap Testing of Hypotheses
3.11.1 Bootstrap Testing and Confidence Intervalsfor the Mean
3.11.2 Studentized Test for the Mean
3.11.3 Studentized Test for the Difference of Two Means
3.11.4 Bootstrap Tests and Confidence Intervalsfor the Variance
3.11.5 Comparing Statistics of Several Samples
3.11.5.1 Comparing Variances of Several Samples
3.11.5.2 Comparing Several Means: The One-Way Analysis of Variance
3.12 Bootstrap Tolerance Intervals
3.12.1 Bootstrap Tolerance Intervals for Bernoulli Samples
3.12.2 Tolerance Interval for Continuous Variables
3.12.3 Distribution-Free Tolerance Intervals
3.13 Non-Parametric Tests
3.13.1 The Sign Test
3.13.2 The Randomization Test
3.13.3 The Wilcoxon Signed-Rank Test
3.14 Chapter Highlights
3.15 Exercises
978-3-031-07566-7_4
4 Variability in Several Dimensions and Regression Models
4.1 Graphical Display and Analysis
4.1.1 Scatterplots
4.1.2 Multiple Boxplots
4.2 Frequency Distributions in Several Dimensions
4.2.1 Bivariate Joint Frequency Distributions
4.2.2 Conditional Distributions
4.3 Correlation and Regression Analysis
4.3.1 Covariances and Correlations
4.3.2 Fitting Simple Regression Lines to Data
4.3.2.1 The Least Squares Method
4.3.2.2 Regression and Prediction Intervals
4.4 Multiple Regression
4.4.1 Regression on Two Variables
4.4.2 Partial Regression and Correlation
4.4.3 Multiple Linear Regression
4.4.4 Partial-F Tests and the Sequential SS
4.4.5 Model Construction: Step-Wise Regression
4.4.6 Regression Diagnostics
4.5 Quantal Response Analysis: Logistic Regression
4.6 The Analysis of Variance: The Comparison of Means
4.6.1 The Statistical Model
4.6.2 The One-Way Analysis of Variance (ANOVA)
4.7 Simultaneous Confidence Intervals: Multiple Comparisons
4.8 Contingency Tables
4.8.1 The Structure of Contingency Tables
4.8.2 Indices of association for contingency tables
4.8.2.1 Two Interval-Scaled Variables
4.8.2.2 Indices of Association for CategoricalVariables
4.9 Categorical Data Analysis
4.9.1 Comparison of Binomial Experiments
4.10 Chapter Highlights
4.11 Exercises
978-3-031-07566-7_5
5 Sampling for Estimation of Finite Population Quantities
5.1 Sampling and the Estimation Problem
5.1.1 Basic Definitions
5.1.2 Drawing a Random Sample from a Finite Population
5.1.3 Sample Estimates of Population Quantities and Their Sampling Distribution
5.2 Estimation with Simple Random Samples
5.2.1 Properties of n and S2n Under RSWR
5.2.2 Properties of n and S2n Under RSWOR
5.3 Estimating the Mean with Stratified RSWOR
5.4 Proportional and Optimal Allocation
5.5 Prediction Models with Known Covariates
5.6 Chapter Highlights
5.7 Exercises
978-3-031-07566-7_6
6 Time Series Analysis and Prediction
6.1 The Components of a Time Series
6.1.1 The Trend and Covariances
6.1.2 Analyzing Time Series with Python
6.2 Covariance Stationary Time Series
6.2.1 Moving Averages
6.2.2 Auto-Regressive Time Series
6.2.3 Auto-Regressive Moving Average Time Series
6.2.4 Integrated Auto-Regressive Moving Average Time Series
6.2.5 Applications with Python
6.3 Linear Predictors for Covariance Stationary Time Series
6.3.1 Optimal Linear Predictors
6.4 Predictors for Non-stationary Time Series
6.4.1 Quadratic LSE Predictors
6.4.2 Moving Average Smoothing Predictors
6.5 Dynamic Linear Models
6.5.1 Some Special Cases
6.5.1.1 The Normal Random Walk
6.5.1.2 Dynamic Linear Model With Linear Growth
6.5.1.3 Dynamic Linear Model for ARMA(p,q)
6.6 Chapter Highlights
6.7 Exercises
978-3-031-07566-7_7
7 Modern Analytic Methods: Part I
7.1 Introduction to Computer Age Statistics
7.2 Data Preparation
7.3 The Information Quality Framework
7.4 Determining Model Performance
7.5 Decision Trees
7.6 Ensemble Models
7.7 Naïve Bayes Classifier
7.8 Neural Networks
7.9 Clustering Methods
7.9.1 Hierarchical Clustering
7.9.2 K-Means Clustering
7.9.3 Cluster Number Selection
7.10 Chapter Highlights
7.11 Exercises
978-3-031-07566-7_8
8 Modern Analytic Methods: Part II
8.1 Functional Data Analysis
8.2 Text Analytics
8.3 Bayesian Networks
8.4 Causality Models
8.5 Chapter Highlights
8.6 Exercises
1 (1)
A Introduction to Python
A.1 List, Set, and Dictionary Comprehensions
A.2 Pandas Data Frames
A.3 Data Visualization Using Pandas and Matplotlib
B List of Python Packages
C Code Repository and Solution Manual
Bibliography
Index