This book presents an introduction to the statistical analysis of repeated measures data using Bayesian multilevel regression models. Our approach is to fit these models using the brms package and the Stan programming language in R. This book introduces mathematical and modeling concepts in plain English, and focuses on understanding the visual/geometric consequences of different regression model structures rather than on rigorous mathematical explanations of these.
Statistical modeling is as much a coding challenge as it is a mathematical challenge. As any programmer with some experience knows, copying existing scripts and modifying them slightly is an excellent way to learn to code, and often a new skill can be learned shortly after an understandable example can be found. To that end, rather than use a different toy data set for every new topic introduced, this book presents a set of fully worked analyses involving increasingly complicated models fit to the same experimental data.
In this book, the authors offer an introduction to statistics entirely focused on repeated measures data beginning with very simple two-group comparisons and ending with multinomial regression models with many ‘random effects’. Across 13 well-structured chapters, readers are provided with all the code necessary to run all the analyses and make all the plots in the book, as well as useful examples of how to interpret and write-up their own analyses.
This book provides an accessible introduction for readers in any field, with any level of statistical background. Senior undergraduate students, graduate students, and experienced researchers looking to ‘translate’ their skills with more traditional models to a Bayesian framework, will benefit greatly from the lessons in this text.
Author(s): Santiago Barreda, Noah Silbert
Publisher: Routledge
Year: 2023
Language: English
Pages: 485
Cover
Half Title
Title Page
Copyright Page
Dedication
Table of Contents
Preface
Acknowledgments
1 Introduction: Experiments and variables
1.1 Chapter pre-cap
1.2 Experiments and effects
1.2.1 Experiments and inference
1.3 Our experiment
1.3.1 Our experiment: Introduction
1.3.2 Our experimental methods
1.3.3 Our research questions
1.3.4 Our experimental data
1.4 Variables
1.4.1 Populations and samples
1.4.2 Dependent and independent variables
1.4.3 Categorical variables and ‘factors’
1.4.4 Quantitative variables
1.5 Inspecting our data
1.5.1 Inspecting categorical variables
1.5.2 Inspecting quantitative variables
1.6 Exercises
Reference
2 Probabilities, likelihood, and inference
2.1 Chapter pre-cap
2.2 Data and research questions
2.3 Empirical probabilities
2.3.1 Conditional and marginal probabilities
2.3.2 Joint probabilities
2.4 Probability distributions
2.5 The normal distribution
2.5.1 The sample mean
2.5.2 The sample variance (or standard deviation)
2.5.3 The normal density
2.5.4 The standard normal distribution
2.6 Models and inference
2.7 Probabilities of events and likelihoods of parameters
2.7.1 Characteristics of likelihoods
2.7.2 A brief aside on logarithms
2.7.3 Characteristics of likelihoods, continued
2.8 Answering our research questions
2.9 Exercises
References
3 Fitting Bayesian regression models with brms
3.1 Chapter pre-cap
3.2 What are regression models?
3.3 What’s ‘Bayesian’ about these models?
3.3.1 Prior probabilities
3.3.2 Posterior distributions
3.3.3 Posterior distributions and shrinkage
3.4 Sampling from the posterior using Stan and brms
3.5 Estimating a single mean with the brms package
3.5.1 Data and research questions
3.5.2 Description of the model
3.5.3 Errors and residuals
3.5.4 The model formula
3.5.5 Fitting the model: Calling the brm function
3.5.6 Interpreting the model: The print statement
3.5.7 Seeing the samples
3.5.8 Getting the residuals
3.6 Checking model convergence
3.7 Specifying prior probabilities
3.8 The log prior and log posterior densities
3.9 Answering our research questions
3.10 ‘Traditionalists’ corner
3.10.1 One-sample t-test vs. intercept-only Bayesian models
3.10.2 Intercept-only ordinary-least-squares regression vs. intercept-only Bayesian models
3.11 Exercises
4 Inspecting a ‘single group’ of observations using a Bayesian multilevel model
4.1 Chapter pre-cap
4.2 Repeated measures data
4.2.1 Multilevel models and ‘levels’ of variation
4.3 Representing predictors with many levels
4.4 Strategies for estimating factors with many levels
4.4.1 Complete pooling
4.4.2 No pooling
4.4.3 (Adaptive) Partial pooling
4.4.4 Hyperpriors
4.5 Estimating a multilevel model with brms
4.5.1 Data and research questions
4.5.2 Description of the model
4.5.3 Fitting the model
4.5.4 Interpreting the model
4.6 ‘Random’ effects
4.6.1 Inspecting the random effects
4.7 Simulating data using our model parameters
4.8 Adding a second random effect
4.8.1 Updating the model description
4.8.2 Fitting and interpreting the model
4.9 Investigating ‘shrinkage’
4.10 Answering our research questions
4.11 ‘Traditionalists’ corner
4.11.1 Bayesian multilevel models vs. lmer
4.12 Exercises
References
5 Comparing two groups of observations: Factors and contrasts
5.1 Chapter pre-cap
5.2 Comparing two groups
5.3 Distribution of repeated measures across factor levels
5.4 Data and research questions
5.5 Estimating the difference between two means with ‘brms’
5.5.1 Fitting the model
5.5.2 Interpreting the model
5.6 Contrasts
5.6.1 Treatment coding
5.6.2 Sum coding
5.6.3 Comparison of sum and treatment coding
5.7 Sum coding and the decomposition of variation
5.7.1 Description of the model
5.7.2 Fitting the model
5.7.3 Comparison of sum and treatment coding
5.8 Inspecting and manipulating the posterior samples
5.8.1 Using the hypothesis function
5.8.2 Working with the random effects
5.9 Making our models more robust: The (non-standardized) t-distribution
5.10 Re-fitting with t-distributed errors
5.10.1 Description of the model
5.10.2 Fitting and interpreting the model
5.11 Simulating the two-group model
5.12 Answering our research questions
5.13 ‘Traditionalists’ corner
5.13.1 Bayesian multilevel models vs. lmer
5.14 Exercises
6 Variation in parameters (‘random effects’) and model comparison
6.1 Chapter pre-cap
6.2 Data and research questions
6.3 Variation in parameters across sources of data
6.3.1 Description of our model
6.3.2 Correlations between random parameters
6.3.3 Random effects and the multivariate normal distribution
6.3.4 Specifying priors for a multivariate normal distribution
6.3.5 Updating our model description
6.3.6 Fitting and interpreting the model
6.4 Model comparison
6.4.1 In-sample and out-of-sample prediction
6.4.2 Out-of-sample prediction: Adjusting predictive accuracy
6.4.3 Out-of-sample prediction: Cross-validation
6.4.4 Selecting a model
6.5 Answering our research questions
6.6 ‘Traditionalists’ corner
6.6.1 Bayesian multilevel models vs. lmer
6.7 Exercises
References
7 Comparing many groups, interactions, and posterior predictive checks
7.1 Chapter pre-cap
7.2 Comparing four (or any number of) groups
7.2.1 Data and research questions
7.2.2 Description of our model
7.2.3 Fitting and interpreting the model
7.3 Investigating multiple factors simultaneously
7.3.1 Data and research questions
7.3.2 Description of the model
7.3.3 Fitting and interpreting the model
7.4 Posterior prediction: Using our models to predict new data
7.5 Interactions and interaction plots
7.6 Investigating interactions with a model
7.6.1 Data and research questions
7.6.2 Model formulas
7.6.3 Description of our model
7.6.4 Fitting and interpreting the model
7.6.5 Caulculating group means in the presence of interactions
7.6.6 Calculating simple effects in the presence of interactions
7.6.7 Assessing model fit: Bayesian R[sup(2)]
7.7 Answering our research questions
7.8 Factors with more than two levels
7.9 ‘Traditionalists’ corner
7.9.1 Bayesian multilevel models vs. lmer
7.10 Exercises
References
8 Varying variances, more about priors, and prior predictive checks
8.1 Chapter pre-cap
8.2 Data and Research questions
8.3 More about priors
8.3.1 Prior predictive checks
8.3.2 More specific priors
8.4 Heteroskedasticity and distributional models
8.5 A ‘simple’ model: Error varies according to a single fixed effect
8.5.1 Description of our model
8.5.2 Prior predictive checks
8.5.3 Fitting and interpreting the model
8.6 A ‘complex’ model: Error varies according to fixed and random effects
8.6.1 Description of our model
8.6.2 Fitting and interpreting the model
8.7 Answering our research questions
8.8 Building identifiable and supportable models
8.8.1 Collinearity
8.8.2 Predictable values of categorical predictors
8.8.3 Saturated, and ‘nearly saturated’, models
8.9 Exercises
References
9 Quantitative predictors and their interactions with factors
9.1 Chapter pre-cap
9.2 Data and research questions
9.3 Modeling variation along lines
9.3.1 Description of the model
9.3.2 Centering quantitative predictors
9.3.3 Fitting and interpreting the model
9.4 Models with group-dependent intercepts, but shared slopes
9.4.1 Description of the model
9.4.2 Fitting and interpreting the model
9.4.3 Interpreting group effects in the presence of shared (non-zero) slopes
9.5 Models with group-dependent slopes and intercepts
9.5.1 Description of the model
9.5.2 Fitting and interpreting the model
9.5.3 Interpreting group effects in the presence of varying slopes
9.6 Answering our research questions: Interim discussion
9.7 Data and research questions: Updated
9.8 Models with intercepts and slopes for each level of a grouping factor (i.e. ‘random slopes’)
9.8.1 Description of the model
9.8.2 Fitting and interpreting the model
9.9 Models with multiple predictors for each level of a grouping factor
9.9.1 Description of the model
9.9.2 Fitting and interpreting the model
9.9.3 Model selection
9.10 Answering our research questions: Updated
9.10.1 A word on causality
9.11 Exercises
References
10 Logistic regression and signal detection theory models
10.1 Chapter pre-cap
10.2 Dichotomous variables and data
10.3 Generalizing our linear models
10.4 Logistic regression
10.4.1 Logits
10.4.2 The inverse logit link function
10.4.3 Building intuitions about logits and the inverse logit function
10.5 Logistic regression with one quantitative predictor
10.5.1 Data and research questions
10.5.2 Description of the model
10.5.3 Fitting the model
10.5.4 Interpreting the model
10.5.5 Using logistic models to understand classification
10.5.6 Answering our research question
10.6 Measuring sensitivity and bias
10.6.1 Data and research questions
10.6.2 Description of the model
10.6.3 Fitting and interpreting the model
10.6.4 Answering our research questions
10.7 Exercises
References
11 Multiple quantitative predictors, dealing with large models, and Bayesian ANOVA
11.1 Chapter pre-cap
11.2 Models with multiple quantitative predictors
11.3 Interactions between quantitative predictors
11.3.1 Centering quantitative predictors when including interactions
11.3.2 Data and research questions
11.3.3 Description of the model
11.3.4 Fitting the model
11.3.5 Advantages of Bayesian multilevel models for large models
11.4 Bayesian Analysis of Variance
11.4.1 Getting the standard deviations from our models ‘manually’
11.4.2 Using the BANOVA function
11.4.3 Fitting and comparing the reduced model
11.5 A logistic regression model with multiple quantitative predictors
11.5.1 Data and research questions
11.5.2 Description of the model
11.5.3 Fitting the model and applying a Bayesian ANOVA
11.5.4 Categorization in two dimensions
11.5.5 Model selection and misspecification
11.6 Exercises
References
12 Multinomial and ordinal regression
12.1 Chapter pre-cap
12.2 Multinomial logistic regression
12.2.1 Multinomial logits and the softmax function
12.2.2 Comparison to logistic regression
12.2.3 Data and research questions
12.2.4 Description of our model
12.2.5 Fitting the model
12.2.6 Interpreting the model
12.2.7 Multinomial models and territorial maps
12.2.8 Refitting the model without speaker random effects
12.2.9 Answering our research questions
12.3 Ordinal (logistic) regression
12.3.1 Cumulative distribution functions
12.3.2 Data and research questions
12.3.3 Description of the model
12.3.4 Fitting and interpreting the model
12.3.5 Listener-specific discrimination terms
12.3.6 Answering our research questions
12.4 Exercises
References
13 Writing up experiments: An investigation of the perception of apparent speaker characteristics from speech acoustics
13.1 Introduction
13.1.1 Fundamental frequency and voice pitch
13.1.2 Variation in fundamental frequency between speakers
13.1.3 Voice resonance and vocal-tract length
13.1.4 Estimating vocal-tract length from speech
13.1.5 Variation in vocal-tract length between speakers
13.1.6 Perception of age, gender, and size
13.1.7 Category-dependent behavior
13.1.8 The current experiment
13.2 Methods
13.2.1 Participants
13.2.2 Stimuli
13.2.3 Procedure
13.2.4 Data screening
13.2.5 Loading the data and packages
13.2.6 Statistical Analysis: Apparent height
13.2.7 Statistical Analysis: Apparent gender
13.3 Results: Apparent height judgments
13.4 Discussion: Apparent height
13.4.1 Age-dependent use of VTL cues on apparent height
13.4.2 The effect of apparent gender on apparent height
13.5 Conclusion: Apparent height judgments
13.6 Results: Apparent gender judgments
13.7 Discussion: Apparent gender judgments
13.7.1 Effect of apparent age on the perception of femaleness
13.7.2 Between-listener variation in gender perception
13.7.3 Beyond gross acoustic cues in gender perception
13.8 Conclusion: Apparent gender
13.9 Next steps
13.9.1 Research design, variable selection, etc
13.9.2 Non-linear models
13.9.3 Other data distributions
13.9.4 Multivariate analyses
References
Index