To date, statistics has tended to be neatly divided into two theoretical approaches or frameworks: frequentist (or classical) and Bayesian. Scientists typically choose the statistical framework to analyse their data depending on the nature and complexity of the problem, and based on their personal views and prior training on probability and uncertainty. Although textbooks and courses should reflect and anticipate this dual reality, they rarely do so. This accessible textbook explains, discusses, and applies both the frequentist and Bayesian theoretical frameworks to fit the different types of statistical models that allow an analysis of the types of data most commonly gathered by life scientists. It presents the material in an informal, approachable, and progressive manner suitable for readers with only a basic knowledge of calculus and statistics.
Statistical Modeling with R is aimed at senior undergraduate and graduate students, professional researchers, and practitioners throughout the life sciences, seeking to strengthen their understanding of quantitative methods and to apply them successfully to real world scenarios, whether in the fields of ecology, evolution, environmental studies, or computational biology.
Author(s): Pablo Inchausti
Publisher: Oxford University Press
Year: 2023
Language: English
Pages: 518
City: Oxford
Cover
Titlepage
Copyright
Dedication
Preface
Contents
PART I The Conceptual Basis for Fitting Statistical Models
1 General Introduction
1.1 The purpose of statistics
1.2 Statistics in a schizophrenic state?
1.3 How is this book organized?
1.4 How to use this book
References
2 Statistical Modeling
2.1 What is a statistical model?
2.2 What is this thing called probability?
2.3 Linking probability with statistics
2.4 The early Bayesian demise during the 1930s
References
3 Estimating Parameters
3.1 Introduction
3.2 Least squares: A theory of errors and the normal distribution
3.3 Maximum likelihood
3.3.1 The basic concepts
3.3.2 Obtaining maximum likelihood estimates
3.3.3 Using maximum likelihood estimates in statistical inference
3.4 Bayesian parameter estimation: The basics
3.5 Bayesian methods: Markov chain Monte Carlo to the rescue
3.6 Quality control for the algorithms of Bayesian methods
3.7 More general MCMC variations: Metropolis–Hastings and Gibbs algorithms
3.8 Recent advances in Bayesian methods: Hamiltonian Monte Carlo
3.9 Bayesian hypothesis tests
3.10 Summary of the main differences between maximum likelihood and Bayesian methods
References
PART II Applying the Generalized Linear Model to Varied Data Types
4 The General Linear Model I
4.1 Introduction
4.2 The lognormal distribution and its relation to the general linear model
4.3 Simple linear regression: One continuous explanatory variable
4.4 Simple linear regression: Frequentist fitting
4.5. Tools for model validation in frequentist statistics
4.6 Simple linear regression: Bayesian fitting
4.7. Tools for model validation in Bayesian statistics
4.8 Multiple linear regression: More than one numerical explanatory variable
4.9 Multiple linear regression: Frequentist fitting
4.10 The importance of standardizing explanatory variables
4.11 Polynomial regression
4.12 Multiple linear regression: Bayesian fitting
4.13 Problems
References
5 The General Linear Model II
5.1 Introduction
5.2. Student's t-test: One categorical explanatory variable with two groups
5.3. The t-test: Frequentist fitting
5.4. The t-test: Bayesian fitting
5.5. Viewing one-way analysis of variance as a multiple regression
5.6 One-way analysis of variance: Frequentist fitting
5.7 One-way analysis of variance: Bayesian fitting
5.8. A posteriori tests in frequentist models
5.9 A posteriori tests in Bayesian models?
5.10 Problems
References
6 The General Linear Model III
6.1 Introduction
6.2 Factorial analysis of variance
6.3 Factorial analysis of variance: Frequentist fitting
6.4 Factorial analysis of variance: Bayesian fitting
6.5 Analysis of covariance: Mixing continuous and categorical explanatory variables
6.6 Analysis of covariance: Frequentist fitting
6.7 Analysis of covariance: Bayesian fitting
6.8 Problems
References
7 Model Selection
7.1 Introduction
7.2 The problem of model selection: Parsimony in statistics
7.3 Model selection criteria in the frequentist framework: AIC
7.4 Model selection criteria in the Bayesian framework: DIC and WAIC
7.5 The posterior predictive distribution and posterior predictive checks
7.6 Now back to the WAIC and LOO-CV
7.7 Prior predictive distributions: A relatively ``new'' kid on the block
References
8 The Generalized Linear Model
8.1 Introduction
8.2 What are GLMs made of?
8.3 Fitting GLMs
8.4 Goodness of fit in GLMs
8.5 Statistical significance of GLM
References
9 When the Response Variable is Binary
9.1 Introduction
9.2 Key concepts for binary GLMs: Odds, log odds, and additional link functions
9.3 Fitting binary GLMs
9.4 Ungrouped binary GLM: Frequentist fitting
9.5 Further issues about validating binary GLMs
9.6 Ungrouped binary GLMs: Bayesian fitting
9.7 Grouped binary GLMs
9.8 Problems
References
10 When the Response Variable is a Count, Often with Many Zeros
10.1 Introduction
10.2 Over-dispersion: A common problem with many causes and some solutions
10.3 Plant species richness and geographical variables
10.3.1 Frequentist fitting of the count GLM
10.3.2 Bayesian fitting of count GLMs
10.4 Modeling of counts with an excess of zeros: Zero-inflated and hurdle models
10.4.1 Frequentist fitting of a zero-inflated model
10.4.2 Bayesian fitting of a zero-augmented model
10.5 Problems
References
11 Further Issues Involved in the Modeling of Counts
11.1 ``The more you search, the more you find''
11.2 Log-linear models as count GLMs
11.3 Frequentist fitting of a log-linear model
11.4 Bayesian fitting of a log-linear model
11.5 Problems
References
12 Models for Positive, Real-Valued Response Variables
12.1 Introduction
12.2 Modeling proportions
12.3 Plant cover, grazing, and productivity
12.4 Frequentist fitting of a GLM on proportions
12.5 Bayesian fitting of a GLM on proportions
12.6 Modeling positive, real-valued response variables
12.7 Predicting tree seedling biomass
12.8 Frequentist fitting of a gamma GLM
12.9 Bayesian fitting of a gamma GLM
12.10 Other related yet important cases of positive, real-valued response variables
12.11 Problems
References
Approaches to Defining Priors
PART III Incorporating Experimental and Survey Design Using Mixed Models
13 Accounting for Structure in Mixed/Hierarchical Models
13.1 Introduction
13.2 Fixed effects and random effects in the frequentist framework
13.3 Defining mixed effects models
13.4 Problems and inconsistencies with the definition of random effects
13.5 Population-level and group-level effects in Bayesian hierarchical models
13.6 Fitting mixed models in the frequentist framework
13.7 Statistical significance and model selection in frequentist mixed models
13.8 The shrinkage or borrowing strength effect in mixed models
13.9 Fitting mixed models in the Bayesian framework
13.10 Problems
References
14 Experimental Design in the Life Sciences
14.1 Introduction
14.2 The basic principles of experimental design
14.3 Surveys and observational studies
14.4 The main types of experimental design used in the life sciences
14.4.1 Factorial design
14.4.2 Randomized block design
14.4.3 Split-plot design
14.4.4 Nested design
14.4.5 Repeated measures design
14.4.6 Crossover design
14.5 How many samples should we take?
References
15 Mixed Hierarchical Models and Experimental Design Data
15.1 Introduction
15.2 Binary GLMM with a randomized block design
15.2.1 Binary GLMM with a randomized block design: Frequentist models
15.2.2 Binary GLMM with a randomized block design: Bayesian models
15.3 Gaussian GLMM with a repeated measures design
15.3.1 Gaussian GLMM with a repeated measures design: Frequentist models
15.3.2 Gaussian GLMM with a repeated measures design: Bayesian models
15.4 Beta GLMM with a split-plot design
15.4.1 Beta GLMM with a split-plot design: Frequentist model
15.4.2 Beta GLMM with a split-plot design: Bayesian model
15.5 Problems
References
Appendix A: List of R Packages Used in This Book
Index
Appendix B: Exploring and Describing the Evidence in Graphics
B.1 First steps doing graphics with R
B.2 The graphics package: Basic plots in R
B.2.1 x–y plot or scatterplot for two quantitative variables
B.2.2 Scatterplot with different groups
B.2.3 Histogram and density plots
B.2.4 Boxplots
B.3 Diagramming the graphic page: Seeing and saving more than one plot
B.4. ggplot2: A better way to make graphics in R
B.4.1 Scatter plots in ggplot2
B.4.2 Scatterplot for different groups in ggplot2
B.4.3 Histogram and density plots in ggplot2
B.4.4 Boxplots in ggplot2
B.5 Closing comments about plotting in R
Appendix C: Using R and RStudio: The Bare-Bones Basics
C.1 Downloading and installing R and RStudio
C.2 First steps in RStudio
C.2.1 Setting the working directory
C.2.2 Creating a script
C.2.3 Creating and using variables in R
C.2.4 Saving scripts and results when quitting RStudio