Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R,

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Now in its second edition, this introductory statistics textbook conveys the essential concepts and tools needed to develop and nurture statistical thinking. It presents descriptive, inductive and explorative statistical methods and guides the reader through the process of quantitative data analysis. This revised and extended edition features new chapters on logistic regression, simple random sampling, including bootstrapping, and causal inference. The text is primarily intended for undergraduate students in disciplines such as business administration, the social sciences, medicine, politics, and macroeconomics. It features a wealth of examples, exercises and solutions with computer code in the statistical programming language R, as well as supplementary material that will enable the reader to quickly adapt the methods to their own applications. The success of the open-source statistical software “R” has made a significant impact on the teaching and research of statistics in the last decade. Analyzing data is now easier and more affordable than ever, but choosing the most appropriate statistical methods remains a challenge for many users. To understand and interpret software output, it is necessary to engage with the fundamentals of statistics. However, many readers do not feel comfortable with complicated mathematics. In this book, we attempt to find a healthy balance between explaining statistical concepts comprehensively and showing their application and interpretation using R. This book will benefit beginners and self-learners from various backgrounds as we complement each chapter with various exercises and detailed and comprehensible solutions. The results involving mathematics and rigorous proofs are separated from the main text, where possible, and are kept in an appendix for interested readers. Our textbook covers material that is generally taught in introductory level statistics courses to students from various backgrounds, including sociology, biology, economics, psychology, medicine, and others. Most often we introduce the statistical concepts using examples and illustrate the calculations both manually and using R.

Author(s): Christian Heumann, Michael Schomaker
Edition: 2
Publisher: Springer
Year: 2022

Language: english
Pages: 584

Preface to the Second Edition
Preface to the First Edition
Contents
About the Authors
Part IDescriptive Statistics
1 Introduction and Framework
1.1 Population, Sample and Observations
1.2 Variables
1.2.1 Qualitative and Quantitative Variables
1.2.2 Discrete and Continuous Variables
1.2.3 Scales
1.2.4 Grouped Data
1.3 Data Collection
1.3.1 Survey
1.3.2 Experiment
1.3.3 Observational Data
1.3.4 Primary and Secondary Data
1.4 Creating a Data Set
1.4.1 Statistical Software
1.5 Key Points and Further Issues
1.6 Exercises
2 Frequency Measures and Graphical Representation of Data
2.1 Absolute and Relative Frequencies
2.1.1 Discrete Data
2.1.2 Grouped Metric Data
2.2 Empirical Cumulative Distribution Function
2.2.1 ECDF for Ordinal Variables
2.2.2 ECDF for Metric Variables
2.3 Graphical Representation of a Variable
2.3.1 Bar Chart
2.3.2 Pie Chart
2.3.3 Histogram
2.4 Kernel Density Plots
2.5 Key Points and Further Issues
2.6 Exercises
3 Measures of Central Tendency and Dispersion
3.1 Measures of Central Tendency
3.1.1 Arithmetic Mean
3.1.2 Median and Quantiles
3.1.3 Quantile–Quantile Plots (QQ-Plots)
3.1.4 Mode
3.1.5 Geometric Mean
3.1.6 Harmonic Mean
3.2 Measures of Dispersion
3.2.1 Range and Interquartile Range
3.2.2 Absolute Deviation, Variance and Standard Deviation
3.2.3 Coefficient of Variation
3.3 Box Plots
3.4 Measures of Concentration
3.4.1 Lorenz Curve
3.4.2 Gini Coefficient
3.5 Key Points and Further Issues
3.6 Exercises
4 Association of Two Variables
4.1 Summarizing the Distribution of Two Discrete Variables
4.1.1 Contingency Tables for Discrete Data
4.1.2 Joint, Marginal, and Conditional Frequency Distributions
4.1.3 Graphical Representation of Two Nominal or Ordinal Variables
4.2 Measures of Association for Two Discrete Variables
4.2.1 Pearson's χ2 Statistic
4.2.2 Cramer's V Statistic
4.2.3 Contingency Coefficient C
4.2.4 Relative Risks and Odds Ratios
4.3 Association Between Ordinal and Metrical Variables
4.3.1 Graphical Representation of Two Metrical Variables
4.3.2 Correlation Coefficient
4.3.3 Spearman's Rank Correlation Coefficient
4.3.4 Measures Using Discordant and Concordant Pairs
4.4 Visualization of Variables from Different Scales
4.5 Key Points and Further Issues
4.6 Exercises
Part IIProbability Calculus
5 Combinatorics
5.1 Introduction
5.2 Permutations
5.2.1 Permutations Without Replacement
5.2.2 Permutations with Replacement
5.3 Combinations
5.3.1 Combinations Without Replacement and Without Consideration of the Order
5.3.2 Combinations Without Replacement and with Consideration of the Order
5.3.3 Combinations with Replacement and Without Consideration of the Order
5.3.4 Combinations with Replacement and with Consideration of the Order
5.4 Key Points and Further Issues
5.5 Exercises
6 Elements of Probability Theory
6.1 Basic Concepts and Set Theory
6.2 Relative Frequency and Laplace Probability
6.3 The Axiomatic Definition of Probability
6.3.1 Corollaries Following from Kolomogorov's Axioms
6.3.2 Calculation Rules for Probabilities
6.4 Conditional Probability
6.4.1 Bayes' Theorem
6.5 Independence
6.6 Key Points and Further Issues
6.7 Exercises
7 Random Variables
7.1 Random Variables
7.2 Cumulative Distribution Function (CDF)
7.2.1 CDF of Continuous Random Variables
7.2.2 CDF of Discrete Random Variables
7.3 Expectation and Variance of a Random Variable
7.3.1 Expectation
7.3.2 Variance
7.3.3 Quantiles of a Distribution
7.3.4 Standardization
7.4 Tschebyschev's Inequality
7.5 Bivariate Random Variables
7.6 Calculation Rules for Expectation and Variance
7.6.1 Expectation and Variance of the Arithmetic Mean
7.7 Covariance and Correlation
7.7.1 Covariance
7.7.2 Correlation Coefficient
7.8 Key Points and Further Issues
7.9 Exercises
8 Probability Distributions
8.1 Standard Discrete Distributions
8.1.1 Discrete Uniform Distribution
8.1.2 Degenerate Distribution
8.1.3 Bernoulli Distribution
8.1.4 Binomial Distribution
8.1.5 The Poisson Distribution
8.1.6 The Multinomial Distribution
8.1.7 The Geometric Distribution
8.1.8 Hypergeometric Distribution
8.2 Standard Continuous Distributions
8.2.1 Continuous Uniform Distribution
8.2.2 The Normal Distribution
8.2.3 The Exponential Distribution
8.3 Sampling Distributions
8.3.1 The χ2-Distribution
8.3.2 The t-Distribution
8.3.3 The F-Distribution
8.4 Key Points and Further Issues
8.5 Exercises
Part IIIInductive Statistics
9 Inference
9.1 Introduction
9.2 Properties of Point Estimators
9.2.1 Unbiasedness and Efficiency
9.2.2 Consistency of Estimators
9.2.3 Sufficiency of Estimators
9.3 Point Estimation
9.3.1 Maximum Likelihood Estimation
9.3.2 Method of Moments
9.4 Interval Estimation
9.4.1 Introduction
9.4.2 Confidence Interval for the Mean of a Normal Distribution
9.4.3 Confidence Interval for a Binomial Probability
9.4.4 Confidence Interval for the Odds Ratio
9.5 Sample Size Determinations
9.5.1 Sample Size Calculation for µ
9.5.2 Sample Size Calculation for p
9.6 Key Points and Further Issues
9.7 Exercises
10 Hypothesis Testing
10.1 Introduction
10.2 Basic Definitions
10.2.1 One- and Two- Sample Problems
10.2.2 Hypotheses
10.2.3 One- and Two-Sided Tests
10.2.4 Type I and Type II Error
10.2.5 How to Conduct a Statistical Test
10.2.6 Test Decisions Using the p-Value
10.2.7 Test Decisions Using Confidence Intervals
10.3 Parametric Tests for Location Parameters
10.3.1 Test for the Mean When the Variance is Known (One-Sample Gauss-Test)
10.3.2 Test for the Mean When the Variance is Unknown (One-Sample t-Test)
10.3.3 Comparing the Means of Two Independent Samples
10.3.4 Test for Comparing the Means of Two Dependent Samples (Paired t-Test)
10.4 Parametric Tests for Probabilities
10.4.1 One-Sample Binomial Test for the Probability p
10.4.2 Two-Sample Binomial Test
10.5 Tests for Scale Parameters
10.6 Wilcoxon–Mann–Whitney (WMW) U-Test
10.7 χ2-Goodness of Fit Test
10.8 χ2-Independence Test and Other χ2-Tests
10.9 Beyond Dichotomies
10.9.1 Compatibility
10.9.2 The S-Value
10.9.3 Graphs of p- and S-Values
10.9.4 Unconditional Interpretations
10.10 Key Points and Further Issues
10.11 Exercises
11 Linear Regression
11.1 The Linear Model
11.2 Method of Least Squares
11.2.1 Properties of the Linear Regression Line
11.3 Goodness of Fit
11.4 Linear Regression with a Binary Covariate
11.5 Linear Regression with a Transformed Covariate
11.6 Linear Regression with Multiple Covariates
11.6.1 Matrix Notation
11.6.2 Categorical Covariates
11.6.3 Transformations
11.7 The Inductive View of Linear Regression
11.7.1 Properties of Least Squares and Maximum Likelihood Estimators
11.7.2 The ANOVA Table
11.7.3 Interactions
11.8 Comparing Different Models
11.9 Checking Model Assumptions
11.10 Association Versus Causation
11.11 Key Points and Further Issues
11.12 Exercises
12 Logistic Regression
12.1 Parameter Interpretation
12.2 Estimation of Parameters and Predictions
12.3 Logistic Regression in R
12.4 Model Selection and Goodness-of-Fit
12.5 Key Points and Further Issues
12.6 Exercises
Part IVAdditional Topics
13 Simple Random Sampling and Bootstrapping
13.1 Introduction
13.2 Methodology of Simple Random Sampling
13.2.1 Procedure of Selection of a Random Sample
13.2.2 Probabilities of Selection
13.3 Estimation of the Population Mean and Population Variance
13.3.1 Estimation of the Population Total
13.3.2 Confidence Interval for the Population Mean
13.4 Sampling for Proportions
13.4.1 Estimation of the Total Count
13.4.2 Confidence Interval Estimation of P
13.5 Bootstrap Methodology
13.6 Nonparametric Bootstrap Methodology
13.6.1 The Empirical Distribution Function
13.6.2 The Plug-in Principle
13.6.3 Steps in Applying the Bootstrap
13.6.4 Bootstrap Estimator and Bootstrap Variance
13.6.5 Bootstrap Estimate of the Bias and Standard Error
13.6.6 Bootstrap Confidence Intervals
13.7 Key Points and Further Issues
13.8 Exercises
14 Causality
14.1 Potential Outcomes
14.2 Causal Questions
14.3 The Causal Model: Directed Acyclic Graphs
14.3.1 Confounders and Confounding
14.3.2 Colliders
14.3.3 Mediators
14.4 Identification
14.4.1 Randomization
14.5 The Statistical Model: Estimation
14.5.1 The g-formula
14.5.2 Regression
14.6 Roadmap
14.7 Key Points and Further Issues
14.8 Exercises
A Introduction to R
A.1 Background
A.2 Installation and Basic Functionalities
A.3 Statistical Functions
A.4 Data Sets
A.4.1 Pizza Delivery Data
A.4.2 Decathlon Data
A.4.3 Theatre Data
A.4.4 Cattaneo Data
B Solutions to Exercises
C Technical Appendix
C.1 More Details on Chap.3摥映數爠eflinkchapter333
C.2 More Details on Chap.7摥映數爠eflinkchapter777
C.3 More Details on Chap.8摥映數爠eflinkchapter888
C.4 More Details on Chap.9摥映數爠eflinkchapter999
C.5 More Details on Chap.10摥映數爠eflinkchapter101010
C.6 More Details on Chap.11摥映數爠eflinkchapter111111
C.7 More Details on Chap.12摥映數爠eflinkchapter121212
C.8 More Details on Chap.13摥映數爠eflinkchapter131313
C.9 Distribution Tables
D Visual Summaries
D.1 Descriptive Data Analysis
D.2 Summary of Tests for Metric and Ordinal Variables
D.3 Summary of Tests for Nominal Variables
References
Index