The Analysis of Biological Data is a new approach to teaching introductory statistics to biology students. To reach this unique audience, Whitlock and Schluter motivate learning with interesting biological and medical examples; they emphasize intuitive understanding; and they focus on real data. The book covers basic topics in introductory statistics, including graphs, confidence intervals, hypothesis testing, comparison of means, regression, and designing experiments. It also introduces the principles behind such modern topics as likelihood, linear models, meta-analysis and computer-intensive methods. Instructors and students consistently praise the book's clear and engaging writing, strong visualization techniques, and its variety of fascinating and relevant biological examples.
Author(s): Michael C. Whitlock, Dolph Schluter
Edition: 3
Publisher: MacMillan, W. H. Freeman
Year: 2020
Language: English
Pages: 2306
City: New York
Halftitle Page
Title Page
Copyright Page
Dedication
Contents in brief
Contents
Preface
About the authors
Acknowledgments
SaplingPlus for Statistics: Course Resources
Online Resources for Students
Chapter 1 Statistics and samples
1.1 What is statistics?
1.2 Sampling Populations
Populations and samples
Properties of good samples
Random sampling
How to take a random sample
The sample of convenience
Volunteer bias
Data in the real world
1.3 Types of Data and Variables
Categorical and numerical variables
Explanatory and response variables
1.4 Frequency distributions and probability distributions
1.5 Types of studies
1.6 Summary
Chapter 1 Problems
Practice problems
Assignment problems
Interleaf 1 Correlation does not require causation
Chapter 2 Displaying data
2.1 Guidelines for effective graphs
How to draw a bad graph
How to draw a good graph
2.2 Showing data for one variable
Showing categorical data: frequency table and bar graph
Making a good bar graph
A bar graph is usually better than a pie chart
Showing numerical data: frequency table and histogram
Describing the shape of a histogram
How to draw a good histogram
Other graphs for numerical data
2.3 Showing association between two variables and differences between groups
Showing association between categorical variables
Showing association between numerical variables: scatter plot
Showing association between a numerical and a categorical variable
2.4 Showing trends in time and space
2.5 How to make good tables
Follow similar principles for display tables
2.6 How to make data files
2.7 Summary
Chapter 2 Problems
Practice problems
Assignment problems
Chapter 3 Describing data
3.1 Arithmetic mean and standard deviation
The sample mean
Variance and standard deviation
Rounding means, standard deviations, and other quantities
Coefficient of variation
Calculating mean and standard deviation from a frequency table
Effect of changing measurement scale
3.2 Median and interquartile range
The median
The interquartile range
The box plot
3.3 How measures of location and spread compare
Mean versus median
Standard deviation versus interquartile range
3.4 Cumulative frequency distribution
Percentiles and quantiles
Displaying cumulative relative frequencies
3.5 Proportions
Calculating a proportion
The proportion is like a sample mean
3.6 Summary
3.7 Quick formula summary
Table of formulas for descriptive statistics
Chapter 3 Problems
Practice problems
Assignment problems
Chapter 4 Estimating with uncertainty
4.1 The sampling distribution of an estimate
Estimating mean gene length with a random sample
The sampling distribution of Y¯
4.2 Measuring the uncertainty of an estimate
Standard error
The standard error of Y¯
The standard error of Y¯ from data
4.3 Confidence intervals
The 2SE rule of thumb
4.4 Error bars
4.5 Summary
4.6 Quick formula summary
Standard error of the mean
Chapter 4 Problems
Practice problems
Assignment problems
Interleaf 2 Pseudoreplication
Chapter 5 Probability
5.1 The probability of an event
5.2 Venn diagrams
5.3 Mutually exclusive events
5.4 Probability distributions
Discrete probability distributions
Continuous probability distributions
5.5 Either this or that: adding probabilities
The addition rule
The probabilities of all possible mutually exclusive outcomes add to one
The general addition rule
5.6 Independence and the multiplication rule
Multiplication rule
“And” versus “or”
Independence of more than two events
5.7 Probability trees
5.8 Dependent events
5.9 Conditional probability and Bayes’ theorem
Conditional probability
The general multiplication rule
Sampling without replacement
Bayes’ theorem
5.10 Summary
Chapter 5 Problems
Practice problems
Assignment problems
Chapter 6 Hypothesis testing
6.1 Making and using statistical hypotheses
Null hypothesis
Alternative hypothesis
To reject or not to reject
6.2 Hypothesis testing: an example
Stating the hypotheses
The test statistic
The null distribution
Quantifying uncertainty: the P-value
Draw the appropriate conclusion
Reporting the results
6.3 Errors in hypothesis testing
Type I and Type II errors
6.4 When the null hypothesis is not rejected
The test
Interpreting a nonsignificant result
6.5 One-sided tests
6.6 Hypothesis testing versus confidence intervals
6.7 Summary
Chapter 6 Problems
Practice problems
Assignment problems
Interleaf 3 Why statistical significance is not the same as biological importance
Chapter 7 Analyzing proportions
7.1 The binomial distribution
Formula for the binomial distribution
Number of successes in a random sample
Sampling distribution of the proportion
7.2 Testing a proportion: the binomial test
Approximations for the binomial test
7.3 Estimating proportions
Estimating the standard error of a proportion
Confidence intervals for proportions—the Agresti–Coull method
Confidence intervals for proportions—the Wald method
7.4 Deriving the binomial distribution
7.5 Summary
7.6 Quick formula summary
Binomial distribution
Proportion
Agresti–Coull 95% confidence interval for a proportion
Binomial test
Chapter 7 Problems
Practice problems
Assignment problems
Interleaf 4 Biology and the history of statistics
Chapter 8 Fitting probability models to frequency data
8.1 χ2 Goodness-of-fit test: the proportional model
Null and alternative hypotheses
Observed and expected frequencies
The χ2 test statistic
The sampling distribution of χ2 under the null hypothesis
Calculating the P-value
Critical values for the χ2 distribution
8.2 Assumptions of the χ2 goodness-of-fit test
8.3 Goodness-of-fit tests when there are only two categories
8.4 Random in space or time: the Poisson distribution
Formula for the Poisson distribution
Testing randomness with the Poisson distribution
Comparing the variance to the mean
8.5 Summary
8.6 Quick formula summary
χ2 Goodness-of-fit test
Test statistic: χ2
Poisson distribution
Chapter 8 Problems
Practice problems
Assignment problems
Interleaf 5 Making a plan
Chapter 9 Contingency analysis: Associations between categorical variables
9.1 Associating two categorical variables
9.2 Estimating association in 2×2 tables: relative risk
Relative risk
Reduction in risk
9.3 Estimating association in 2×2 tables: the odds ratio
Odds
Odds ratio
Standard error and confidence interval for odds ratio
Odds ratio vs. relative risk
9.4 The χ2 contingency test
Hypotheses
Expected frequencies assuming independence
The χ2 statistic
Degrees of freedom
P-value and conclusion
A shortcut for calculating the expected frequencies
The χ2 contingency test is a special case of the χ2 goodness-of-fit test
Assumptions of the χ2 contingency test
Correction for continuity
9.5 Fisher’s exact test
9.6 Summary
9.7 Quick formula summary
Confidence interval for relative risk
Confidence interval for odds ratio
The χ2 contingency test
Fisher’s exact test
Chapter 9 Problems
Practice problems
Assignment problems
Review Problems 1
Chapter 10 The normal distribution
10.1 Bell-shaped curves and the normal distribution
10.2 The formula for the normal distribution
10.3 Properties of the normal distribution
10.4 The standard normal distribution and statistical tables
Using the standard normal table
Using the standard normal to describe any normal distribution
10.5 The normal distribution of sample means
Calculating probabilities of sample means
10.6 Central limit theorem
10.7 Normal approximation to the binomial distribution
10.8 Summary
10.9 Quick formula summary
Z-standardization
Normal approximation to the binomial distribution
Chapter 10 Problems
Practice problems
Assignment problems
Interleaf 6 Controls in medical studies
Chapter 11 Inference for a normal population
11.1 The t-distribution for sample means
Student’s t-distribution
Finding critical values of the t-distribution
11.2 The confidence interval for the mean of a normal distribution
The 95% confidence interval for the mean
The 99% confidence interval for the mean
11.3 The one-sample t-test
The effects of larger sample size: body temperature revisited
11.4 Assumptions of the one-sample t-test
11.5 Estimating the standard deviation and variance of a normal population
Confidence limits for the variance
Confidence limits for the standard deviation
Assumptions
11.6 Summary
11.7 Quick formula summary
Confidence interval for a mean
One-sample t-test
Confidence interval for variance
Chapter 11 Problems
Practice problems
Assignment problems
Chapter 12 Comparing two means
12.1 Paired sample versus two independent samples
12.2 Paired comparison of means
Estimating mean difference from paired data
Paired t-test
Assumptions
12.3 Two-sample comparison of means
Confidence interval for the difference between two means
Two-sample t-test
Assumptions
Welch’s t-test
12.4 Using the correct sampling units
12.5 The fallacy of indirect comparison
12.6 Interpreting overlap of confidence intervals
12.7 Comparing variances
The F-test of equal variances
Levene’s test for homogeneity of variances
12.8 Summary
12.9 Quick formula summary
Confidence interval for the mean difference (paired data)
Paired t-test
Standard error of difference between two means
Confidence interval for the difference between two means (two samples)
Two-sample t-test
Welch’s confidence interval for the difference between two means
Welch’s approximate t-test
F-test
Levene’s test
Chapter 12 Problems
Practice problems
Assignment problems
Interleaf 7 Which test should I use?
Chapter 13 Handling violations of assumptions
13.1 Detecting deviations from normality
Graphical methods
Formal test of normality
13.2 When to ignore violations of assumptions
Violations of normality
Unequal standard deviations
13.3 Data transformations
Log transformation
Other transformations
Confidence intervals with transformations
Avoid multiple testing with transformations
13.4 Nonparametric alternatives to one-sample and paired t-tests
Sign test
The Wilcoxon signed-rank test
13.5 Comparing two groups: the Mann–Whitney U-test
Tied ranks
Large samples and the normal approximation
13.6 Assumptions of nonparametric tests
13.7 Type I and Type II error rates of nonparametric methods
13.8 Permutation tests
Assumptions of permutation tests
13.9 Summary
13.10 Quick formula summary
Transformations
Back-transformations
Sign test
Mann–Whitney U-test
Chapter 13 Problems
Practice problems
Assignment problems
Review Problems 2
Chapter 14 Designing experiments
14.1 Lessons from clinical trials
Design components
14.2 How to reduce bias
Simultaneous control group
Randomization
Blinding
14.3 How to reduce the influence of sampling error
Replication
Balance
Blocking
Extreme treatments
14.4 Experiments with more than one factor
14.5 What if you can’t do experiments?
Match and adjust
14.6 Choosing a sample size
Plan for precision
Plan for power
Plan for data loss
14.7 Summary
14.8 Quick formula summary
Planning for precision
Planning for power
Chapter 14 Problems
Practice problems
Assignment problems
Interleaf 8 Data dredging
Chapter 15 Comparing means of more than two groups
15.1 The analysis of variance
Hypotheses
ANOVA in a nutshell
ANOVA tables
Partitioning the sum of squares
Calculating the mean squares
The variance ratio, F
Variation explained: R^2
ANOVA with two groups
15.2 Assumptions and alternatives
The robustness of ANOVA
Data transformations
Nonparametric alternatives to ANOVA
15.3 Planned comparisons
Planned comparison between two means
15.4 Unplanned comparisons
Testing all pairs of means using the Tukey–Kramer method
Assumptions
15.5 Fixed and random effects
15.6 ANOVA with randomly chosen groups
ANOVA calculations
Variance components
Repeatability
Assumptions
15.7 Summary
15.8 Quick formula summary
Analysis of variance (ANOVA)
Kruskal–Wallis test
Planned confidence interval for the difference between two of k means
Planned test of the difference between two of k means
Tukey–Kramer test of all pairs of means
Repeatability and variance components
Chapter 15 Problems
Practice problems
Assignment problems
Interleaf 9 Experimental and statistical mistakes
Chapter 16 Correlation between numerical variables
16.1 Estimating a linear correlation coefficient
The correlation coefficient
Standard error
Approximate confidence interval
16.2 Testing the null hypothesis of zero correlation
16.3 Assumptions
16.4 The correlation coefficient depends on the range
16.5 Spearman’s rank correlation
Procedure for large n
Assumptions of Spearman’s correlation
16.6 The effects of measurement error on correlation
16.7 Summary
16.8 Quick formula summary
Shortcuts
Covariance
Correlation coefficient
Confidence interval (approximate) for a population correlation
The t-test of zero linear correlation
Spearman’s rank correlation
Spearman’s rank correlation test
Correlation corrected for measurement error
Chapter 16 Problems
Practice problems
Assignment problems
Interleaf 10 Publication bias
Chapter 17 Regression
17.1 Linear regression
The method of least squares
Formula for the line
Calculating the slope and intercept
Populations and samples
Predicted values
Residuals
Standard error of slope
Confidence interval for the slope
17.2 Confidence in predictions
Confidence intervals for predictions
Extrapolation
17.3 Testing hypotheses about a slope
The t-test of regression slope
The ANOVA approach
17.4 Regression toward the mean
17.5 Assumptions of regression
Outliers
Detecting nonlinearity
Detecting non-normality and unequal variance
17.6 Transformations
17.7 The effects of measurement error on regression
17.8 Regression with nonlinear relationships
A curve with an asymptote
Quadratic curves
Formula-free curve fitting
17.9 Logistic regression: fitting a binary response variable
17.10 Summary
17.11 Quick formula summary
Shortcuts
Regression slope
Regression intercept
Confidence interval for the regression slope
Confidence interval for the predicted mean Y at a given X (confidence bands)
Confidence interval for the predicted individual Y at a given X (prediction intervals)
The t-test of a regression slope
The ANOVA method for testing zero slope
Chapter 17 Problems
Practice problems
Assignment problems
Interleaf 11 Meta-analysis
Review Problems 3
Chapter 18 Analyzing multiple factors
18.1 ANOVA and linear regression are linear models
Modeling with linear regression
Generalizing linear regression
Linear models
18.2 Analyzing experiments with blocking
Analyzing data from a randomized block design
Model formula
Fitting the model to data
18.3 Analyzing factorial designs
Model formula
Testing the factors
The importance of distinguishing fixed and random factors
18.4 Adjusting for the effects of a covariate
Testing interaction
Fitting a model without an interaction term
18.5 Assumptions of linear models
18.6 Summary
Chapter 18 Problems
Practice problems
Assignment problems
Interleaf 12 Using species as data points
Chapter 19 Computer-intensive methods
19.1 Hypothesis testing using simulation
19.2 Bootstrap standard errors and confidence intervals
Bootstrap standard error
Confidence intervals by bootstrapping
Bootstrapping with multiple groups
Assumptions and limitations of the bootstrap
19.3 Summary
Chapter 19 Problems
Practice problems
Assignment problems
Chapter 20 Likelihood
20.1 What is likelihood?
20.2 Two uses of likelihood in biology
Phylogeny estimation
Gene mapping
20.3 Maximum likelihood estimation
Probability model
The likelihood formula
The maximum likelihood estimate
Likelihood-based confidence intervals
20.4 Versatility of maximum likelihood estimation
Probability model
The likelihood formula
The maximum likelihood estimate
Bias
20.5 Log-likelihood ratio test
Likelihood ratio test statistic
Testing a population proportion
20.6 Summary
20.7 Quick formula summary
Likelihood
Likelihood-based confidence interval for a single parameter
Log-likelihood ratio test for a single parameter
Chapter 20 Problems
Practice problems
Assignment problems
Chapter 21 Survival analysis
21.1 Survival curves
Calculation summary
Confidence intervals
Median survival time
Assumptions
21.2 Compare survival curves
Hazard ratio
Hazard ratio calculation
Logrank test
Assumptions
21.3 Summary
21.4 Quick formula summary
Hazard ratio
95% Confidence interval for the hazard ratio
Logrank test
Chapter 21 Problems
Practice problems
Assignment problems
Notes
Statistical tables
Using statistical tables
Statistical Table A: The χ^2 distribution
Statistical Table B: The standard normal (Z) distribution
Statistical Table C: Student’s t-distribution
Statistical Table D: The F-distribution
Statistical Table E: Mann–Whitney U-distribution
Statistical Table F: Tukey–Kramer q-distribution
Statistical Table G: Critical values for the Spearman’s rank correlation
Literature cited
Answers to practice problems
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Chapter 17
Chapter 18
Chapter 19
Chapter 20
Chapter 21
Index