Statistics for linguistics with R : a practical introduction

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Statistics for Linguists: An Introduction Using R is the first statistics textbook on linear models for linguistics. The book covers simple uses of linear models through generalized models to more advanced approaches, maintaining its focus on conceptual issues and avoiding excessive mathematical details. It contains many applied examples using the R statistical programming environment. Written in an accessible tone and style, this text is the ideal main resource for graduate and advanced undergraduate students of Linguistics statistics courses as well as those in other fields, including Psychology, Cognitive Science, and Data Science. This book is the revised and extended second edition of Statistics for Linguistics with R. The comprehensive revision includes new small sections on programming topics that facilitate statistical analysis, the addition of a variety of statistical functions readers can apply to their own data, and a revision of overview sections on statistical tests and regression modeling. The main revision is a complete rewrite of the chapter on multifactorial approaches, which now contains sections on linear regression, binary and ordinal logistic regression, multinomial and Poisson regression, and repeated-measures ANOVA.The revisions are completed by a new visual tool to identify the right statistical test for a given problem and data set.

Author(s): Stefan Thomas Gries
Series: Mouton Textbook
Edition: 3
Year: 2021

Language: English

Title Page
Copyright Page
Endorsements
Introduction
Table of Contents
1 Some fundamentals of empirical research
1.1 Introduction
1.2 On the relevance of quantitative methods in linguistics
1.3 The design and the logic of quantitative studies
1.3.1 Scouting
1.3.2 Hypotheses and operationalization
1.3.2.1 Scientific hypotheses in text form
1.3.2.2 Operationalizing your variables
1.3.2.3 Scientific hypotheses in statistical/mathematical form
1.4 Data collection and storage
1.5 The decision
1.5.1 One-tailed p-values from discrete probability distributions
1.5.2 Two-tailed p-values from discrete probability distributions
1.5.3 Significance and e0ect size
1.6 Data collection and storage
2 Fundamentals of R
2.1 Introduction and installation
2.2 Functions and arguments
2.3 Vectors
2.3.1 Generating vectors
2.3.2 Loading and saving vectors
2.3.3 Working with vectors
2.4 Factors
2.4.1 Generating factors
2.4.2 Loading and saving factors
2.4.3 Working with factors
2.5 Data frames
2.5.1 Generating data frames
2.5.2 Loading and saving data frames
2.5.3 Working with data frames
2.6 Lists
3 Descriptive statistics
3.1 Univariate descriptive statistics
3.1.1 Categorical variables
3.1.1.1 Central tendency: the mode
3.1.1.2 Dispersion: normalized entropy
3.1.1.3 Visualization
3.1.2 Ordinal variables
3.1.2.1 Central tendency: the median
3.1.2.2 Dispersion: quantiles etc.
3.1.2.3 Visualization
3.1.3 Numeric variables
3.1.3.1 Central tendency: arithmetic mean
3.1.3.2 Dispersion: standard deviation etc.
3.1.3.3 Visualization
3.1.3.4 Two frequent transformations
3.1.4 Standard errors and confidence intervals
3.1.4.1 Standard errors for percentages
3.1.4.2 Standard errors for means
3.1.4.3 Confidence intervals
3.2 Bivariate descriptive statistics
3.2.1 Categorical/ordinal as a function of categorical/ordinal variables
3.2.2 Categorical/ordinal variables as a function of numeric variables
3.2.3 Numeric variables as a function of categorical/ordinal variables
3.2.4 Numeric variables as a function of numeric variables
3.3 Polemic excursus 1: on ‘correlation’
3.4 Polemic excursus 2: on visualization
3.5 (Non-polemic) Excursus on programming
3.5.1 Conditional expressions
3.5.2 On looping
3.5.3 On not looping: the apply family
3.5.4 Function writing
3.5.4.1 Anonymous functions
3.5.4.2 Named functions
4 Monofactorial tests
4.1 Distributions and frequencies
4.1.1 Goodness-of-fit
4.1.1.1 One categorical/ordinal response
4.1.1.2 One numeric response
4.1.2 Tests for di0erences/independence
4.1.2.1 One categorical response and one categorical predictor (indep.samples)
4.1.2.2 One ordinal/numeric response and one categorical predictor (indep.samples)
4.2 Dispersion
4.2.1 Goodness-of-fit test for one numeric response
4.2.2 Test for independence for one numeric response and one categorical predictor
4.2.2.1 A small excursus: simulation
4.3 Central tendencies
4.3.1 Goodness-of-fit tests
4.3.1.1 One ordinal response
4.3.1.2 One numeric response
4.3.2 Tests for di0erences/independence
4.3.2.1 One ordinal response and one categorical predictor (indep. samples)
4.3.2.2 One ordinal response and one categorical predictor (dep. samples)
4.3.2.3 One numeric response and one categorical predictor (indep. samples)
4.3.2.4 One numeric response and one categorical predictor (dep. samples)
4.4 Correlation and simple linear regression
4.4.1 Ordinal variables
4.4.2 Numeric variables
4.4.3 Correlation and causality
5 Fixed-e&ects regression modeling
5.1 A bit on ‘multifactoriality’
5.2 Linear regression
5.2.1 A linear model with a numeric predictor
5.2.1.1 Numerical exploration
5.2.1.2 Graphical model exploration
5.2.1.3 Excursus: curvature and anova
5.2.1.4 Excursus: model frames and model matrices
5.2.1.5 Excursus: the 95CI of the slope
5.2.2 A linear model with a binary predictor
5.2.2.1 Numerical exploration
5.2.2.2 Graphical model exploration
5.2.2.3 Excursus: coeDcients as instructions
5.2.3 A linear model with a categorical predictor
5.2.3.1 Numerical exploration
5.2.3.2 Graphical model exploration
5.2.3.3 Excursus: conflation, model comparison, and contrasts
5.2.4 Towards multifactorial modeling
5.2.4.1 Simpsons paradox
5.2.4.2 Interactions
5.2.5 A linear model with two categorical predictors
5.2.5.1 Numerical exploration
5.2.5.2 Graphical model exploration
5.2.5.3 Excursus: collinearity and VIFs
5.2.6 A linear model with a categorical and a numeric predictor
5.2.6.1 Numerical exploration
5.2.6.2 Graphical model exploration
5.2.6.3 Excursus: post-hoc comparisons and predictions from e0ects
5.2.7 A linear model with two numeric predictors
5.2.7.1 Numerical exploration
5.2.7.2 Graphical model exploration
5.2.7.3 Excursus: where are most of the values?
5.2.8 Interactions (yes, again)
5.3 Binary logistic regression
5.3.1 A binary logistic regression with a binary predictor
5.3.1.1 Numerical exploration
5.3.1.2 Graphical model exploration
5.3.2 A binary logistic regression with a categorical predictor
5.3.2.1 Numerical exploration
5.3.2.2 Graphical model exploration
5.3.3 A binary logistic regression with a numeric predictor
5.3.3.1 Numerical exploration
5.3.3.2 Graphical model exploration
5.3.3.3 Excursus: on cut-o0 points
5.3.4 A binary logistic regression with two categorical predictors
5.3.4.1 Numerical exploration
5.3.4.2 Graphical model exploration
5.3.5 Two more e0ects plots for you to recreate
5.4 Other regression models
5.4.1 Multinomial regression
5.4.1.1 A multinomial regression with a numeric predictor
5.4.1.2 A multinomial regression with a categorical predictor
5.4.1.3 Multinomial and binary logistic regression
5.4.2 Ordinal logistic regression
5.4.2.1 An ordinal regression with a numeric predictor
5.4.2.2 An ordinal regression with a categorical predictor
5.5 Model formulation (and model selection)
5.6 Model assumptions/diagnostics
5.6.1 Amount of data
5.6.2 Residuals
5.6.3 Influential data points
5.6.4 Excursus: autocorrelation/time & overdispersion
5.7 Model validation (and classification vs. prediction)
5.8 A thought experiment
6 Mixed-e&ects regression modeling
6.1 A very basic introduction
6.1.1 Varying intercepts only
6.1.2 Varying slopes only
6.1.3 Varying intercepts and slopes
6.1.3.1 Varying intercepts and slopes (correlated)
6.1.3.2 Varying intercepts and slopes (uncorrelated)
6.2 Some general MEM considerations
6.3 Linear MEM case study
6.3.1 Preparation and exploration
6.3.2 Model fitting/selection
6.3.3 Quick excursus on update
6.3.4 Model diagnostics
6.3.5 Model fitting/selection, part 2
6.3.6 A brief interlude
6.3.7 Model diagnostics, part 2
6.3.8 Model interpretation
6.3.9 A bit on MEM predictions
6.4 Generalized linear MEM case study
6.4.1 Preparation and exploration
6.4.2 Model fitting/selection
6.4.3 Model diagnostics
6.4.4 Model interpretation
6.5 On convergence and final recommendations
7 Tree-based approaches
7.1 Trees
7.1.1 Classification and regression trees
7.1.2 Conditional inference trees
7.2 Ensembles of trees: forests
7.2.1 Forests of classification and regression trees
7.2.2 Forests of conditional inference trees
7.3 Discussion
References
About the Author