The Analysis of Biological Data

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Author(s): Michael C. Whitlock and Dolph Schluter
Edition: 2
Publisher: W. H. Freeman and Company
Year: 2015

Language: English
Pages: 1058
City: New York, NY, USA

Cover......Page 1
Halftitle Page......Page 2
Title Page......Page 3
Copyright......Page 4
Dedication......Page 5
Contents in brief......Page 6
Contents......Page 8
Preface......Page 22
A word about the data......Page 24
Acknowledgments......Page 26
About the Authors......Page 28
1 Statistics and samples......Page 29
1.1 What is statistics?......Page 30
EXAMPLE 1.2 Raining cats......Page 32
Populations and samples......Page 33
Properties of good samples......Page 34
Random sampling......Page 35
How to take a random sample......Page 36
The sample of convenience......Page 37
Volunteer bias......Page 38
Data in the real world......Page 39
Categorical and numerical variables......Page 40
Explanatory and response variables......Page 41
1.4 Frequency distributions and probability distributions......Page 43
1.5 Types of studies......Page 45
1.6 Summary......Page 46
PRACTICE PROBLEMS......Page 47
ASSIGNMENT PROBLEMS......Page 51
1 INTERLEAF Biology and the history of statistics......Page 55
2 Displaying data......Page 57
How to draw a bad graph......Page 59
How to draw a good graph......Page 60
Showing categorical data: frequency table and bar graph......Page 63
A bar graph is usually better than a pie chart......Page 65
Showing numerical data: frequency table and histogram......Page 66
Describing the shape of a histogram......Page 69
How to draw a good histogram......Page 70
Other graphs for numerical data......Page 71
Showing association between categorical variables......Page 72
Showing association between numerical variables: scatter plot......Page 75
Showing association between a numerical and a categorical variable......Page 76
Line graph......Page 80
Maps......Page 81
Follow similar principles for display tables......Page 83
2.6 Summary......Page 86
PRACTICE PROBLEMS......Page 87
ASSIGNMENT PROBLEMS......Page 97
3 Describing data......Page 107
EXAMPLE 3.1 Gliding snakes......Page 109
Variance and standard deviation......Page 110
Coefficient of variation......Page 112
Calculating mean and standard deviation from a frequency table......Page 113
Effect of changing measurement scale......Page 114
EXAMPLE 3.2 I’d give my right arm for a female......Page 116
The interquartile range......Page 117
The box plot......Page 118
EXAMPLE 3.3 Disarming fish......Page 120
Mean versus median......Page 121
Standard deviation versus interquartile range......Page 123
Displaying cumulative relative frequencies......Page 124
The proportion is like a sample mean......Page 126
3.6 Summary......Page 128
Table of formulas for descriptive statistics......Page 129
PRACTICE PROBLEMS......Page 130
ASSIGNMENT PROBLEMS......Page 137
EXAMPLE 4.1 The length of human genes......Page 145
Estimating mean gene length with a random sample......Page 146
The sampling distribution of Y¯......Page 147
The standard error of Y¯ from data......Page 150
4.3 Confidence intervals......Page 152
The 2SE rule of thumb......Page 153
4.4 Error bars......Page 155
4.5 Summary......Page 157
Standard error of the mean......Page 158
PRACTICE PROBLEMS......Page 159
ASSIGNMENT PROBLEMS......Page 162
2 INTERLEAF Pseudoreplication......Page 170
5 Probability......Page 172
5.1 The probability of an event......Page 173
5.2 Venn diagrams......Page 175
5.3 Mutually exclusive events......Page 176
Discrete probability distributions......Page 177
Continuous probability distributions......Page 178
The addition rule......Page 180
The probabilities of all possible mutually exclusive outcomes add to one......Page 181
The general addition rule......Page 182
5.6 Independence and the multiplication rule......Page 183
Multiplication rule......Page 184
Independence of more than two events......Page 185
EXAMPLE 5.7 Sex and birth order......Page 187
EXAMPLE 5.8 Is this meat taken?......Page 190
Conditional probability......Page 193
Sampling without replacement......Page 194
Bayes’ theorem......Page 195
5.10 Summary......Page 197
PRACTICE PROBLEMS......Page 198
ASSIGNMENT PROBLEMS......Page 204
6 Hypothesis testing......Page 210
Null hypothesis......Page 212
To reject or not to reject......Page 213
Stating the hypotheses......Page 214
The null distribution......Page 215
Quantifying uncertainty: the P-value......Page 217
Reporting the results......Page 219
Type I and Type II errors......Page 220
The test......Page 222
Interpreting a nonsignificant result......Page 224
6.5 One-sided tests......Page 225
6.6 Hypothesis testing versus confidence intervals......Page 228
6.7 Summary......Page 229
PRACTICE PROBLEMS......Page 231
ASSIGNMENT PROBLEMS......Page 234
3 INTERLEAF Why statistical significance is not the same as biological importance......Page 241
7 Analyzing proportions......Page 243
Formula for the binomial distribution......Page 245
Number of successes in a random sample......Page 246
Sampling distribution of the proportion......Page 248
EXAMPLE 7.2 Sex and the X......Page 250
Approximations for the binomial test......Page 253
Confidence intervals for proportions—the Agresti–Coull method......Page 254
Confidence intervals for proportions—the Wald method......Page 255
7.4 Deriving the binomial distribution......Page 256
7.5 Summary......Page 257
Binomial test......Page 258
PRACTICE PROBLEMS......Page 259
ASSIGNMENT PROBLEMS......Page 264
4 INTERLEAF Correlation does not require causation......Page 268
8 Fitting probability models to frequency data......Page 271
EXAMPLE 8.1 No weekend getaway......Page 273
Observed and expected frequencies......Page 275
The χ2 test statistic......Page 276
The sampling distribution of χ2 under the null hypothesis......Page 277
Calculating the P-value......Page 278
Critical values for the χ2 distribution......Page 279
8.3 Assumptions of the χ2 goodness-of-fit test......Page 282
EXAMPLE 8.4 Gene content of the human X chromosome......Page 283
EXAMPLE 8.5 Designer two-child families?......Page 285
8.6 Random in space or time: the Poisson distribution......Page 288
Testing randomness with the Poisson distribution......Page 289
Comparing the variance to the mean......Page 293
8.7 Summary......Page 294
Poisson distribution......Page 295
PRACTICE PROBLEMS......Page 296
ASSIGNMENT PROBLEMS......Page 300
5 INTERLEAF Making a plan......Page 307
9 Contingency analysis associations between categorical variables......Page 309
9.1 Associating two categorical variables......Page 311
Odds......Page 312
Odds ratio......Page 313
Standard error and confidence interval for odds ratio......Page 314
Odds ratio vs. relative risk......Page 316
EXAMPLE 9.4 The gnarly worm gets the bird......Page 320
Hypotheses......Page 321
The χ2 statistic......Page 322
A shortcut for calculating the expected frequencies......Page 323
Assumptions of the χ2 contingency test......Page 324
Correction for continuity......Page 325
EXAMPLE 9.5 The feeding habits of vampire bats......Page 326
9.6 G-tests......Page 328
9.7 Summary......Page 329
The χ2 contingency test......Page 330
G-test......Page 331
PRACTICE PROBLEMS......Page 332
ASSIGNMENT PROBLEMS......Page 338
Review Problems 1......Page 345
10 The normal distribution......Page 350
10.1 Bell-shaped curves and the normal distribution......Page 352
10.2 The formula for the normal distribution......Page 355
10.3 Properties of the normal distribution......Page 356
Using the standard normal table......Page 358
Using the standard normal to describe any normal distribution......Page 360
10.5 The normal distribution of sample means......Page 363
Calculating probabilities of sample means......Page 364
EXAMPLE 10.6 Young adults and the Spanish flu......Page 366
EXAMPLE 10.7 The only good bug is a dead bug......Page 369
10.8 Summary......Page 372
Normal approximation to the binomial distribution......Page 373
PRACTICE PROBLEMS......Page 374
ASSIGNMENT PROBLEMS......Page 380
6 INTERLEAF Controls in medical studies......Page 386
11 Inference for a normal population......Page 388
Student’s t-distribution......Page 389
Finding critical values of the t-distribution......Page 390
The 95% confidence interval for the mean......Page 393
The 99% confidence interval for the mean......Page 394
EXAMPLE 11.3 Human body temperature......Page 396
The effects of larger sample size: body temperature revisited......Page 399
11.4 Assumptions of the one-sample t-test......Page 401
Confidence limits for the variance......Page 402
Confidence limits for the standard deviation......Page 403
Assumptions......Page 404
11.6 Summary......Page 405
Confidence interval for variance......Page 406
PRACTICE PROBLEMS......Page 407
ASSIGNMENT PROBLEMS......Page 411
12 Comparing two means......Page 417
12.1 Paired sample versus two independent samples......Page 419
Estimating mean difference from paired data......Page 421
Paired t-test......Page 424
Assumptions......Page 426
EXAMPLE 12.3 Spike or be spiked......Page 427
Confidence interval for the difference between two means......Page 428
Two-sample t-test......Page 430
Assumptions......Page 431
A two-sample t-test when standard deviations are unequal......Page 432
EXAMPLE 12.4 So long; thanks to all the fish......Page 433
EXAMPLE 12.5 Mommy’s baby, Daddy’s maybe......Page 436
12.6 Interpreting overlap of confidence intervals......Page 438
Levene’s test for homogeneity of variances......Page 439
12.8 Summary......Page 441
Confidence interval for the difference between two means (two samples)......Page 442
Welch’s approximate t-test......Page 443
Levene’s test......Page 444
PRACTICE PROBLEMS......Page 446
ASSIGNMENT PROBLEMS......Page 456
7 INTERLEAF Which test should I use?......Page 465
13 Handling violations of assumptions......Page 468
Graphical methods......Page 470
Formal test of normality......Page 473
Violations of normality......Page 475
Unequal standard deviations......Page 476
Log transformation......Page 477
Arcsine transformation......Page 480
Confidence intervals with transformations......Page 481
A caveat: Avoid multiple testing with transformations......Page 482
Sign test......Page 483
The Wilcoxon signed-rank test......Page 487
EXAMPLE 13.5 Sexual cannibalism in sagebrush crickets......Page 488
Tied ranks......Page 491
Large samples and the normal approximation......Page 492
13.6 Assumptions of nonparametric tests......Page 493
13.7 Type I and Type II error rates of nonparametric methods......Page 494
13.8 Permutation tests......Page 495
Assumptions of permutation tests......Page 498
13.9 Summary......Page 499
Mann-Whitney U-test......Page 501
PRACTICE PROBLEMS......Page 502
ASSIGNMENT PROBLEMS......Page 514
Review Problems 2......Page 527
14 Designing experiments......Page 535
Confounding variables......Page 537
Experimental artifacts......Page 538
EXAMPLE 14.2 Reducing HIV transmission......Page 539
Design components......Page 540
Simultaneous control group......Page 541
Randomization......Page 542
Blinding......Page 543
Replication......Page 545
Blocking......Page 547
Extreme treatments......Page 550
EXAMPLE 14.5 Lethal combination......Page 552
Match and adjust......Page 554
Plan for precision......Page 556
Plan for power......Page 558
Plan for data loss......Page 559
14.8 Summary......Page 560
Planned sample size for a 95% confidence interval of the difference between two proportions......Page 561
Planned sample size for 2 × 2 contingency test of 80% power at α = 0.05......Page 562
Planned sample size for a two-sample t-test of 80% power at α = 0.05......Page 563
PRACTICE PROBLEMS......Page 564
ASSIGNMENT PROBLEMS......Page 568
8 INTERLEAF Data dredging......Page 572
15 Comparing means of more than two groups......Page 575
EXAMPLE 15.1 The knees who say night......Page 577
ANOVA in a nutshell......Page 578
ANOVA tables......Page 579
Partitioning the sum of squares......Page 580
Calculating the mean squares......Page 581
The variance ratio, F......Page 582
Variation explained: R2......Page 584
ANOVA with two groups......Page 585
Nonparametric alternatives to ANOVA......Page 586
Planned comparison between two means......Page 588
EXAMPLE 15.4 Wood wide web......Page 590
Testing all pairs of means using the Tukey-Kramer method......Page 591
Assumptions......Page 593
15.5 Fixed and random effects......Page 594
EXAMPLE 15.6 Walking-stick limbs......Page 595
ANOVA calculations......Page 596
Variance components......Page 597
Assumptions......Page 598
15.7 Summary......Page 599
Kruskal-Wallis test......Page 601
Tukey-Kramer test of all pairs of means......Page 602
Repeatability and variance components......Page 603
PRACTICE PROBLEMS......Page 604
ASSIGNMENT PROBLEMS......Page 613
9 INTERLEAF Experimental and statistical mistakes......Page 624
16 Correlation between numerical variables......Page 626
The correlation coefficient......Page 628
Standard error......Page 632
Approximate confidence interval......Page 633
EXAMPLE 16.2 What big inbreeding coefficients you have......Page 635
16.3 Assumptions......Page 638
16.4 The correlation coefficient depends on the range......Page 640
EXAMPLE 16.5 The miracles of memory......Page 641
Assumptions of Spearman’s correlation......Page 644
16.6 The effects of measurement error on correlation......Page 645
16.7 Summary......Page 646
Confidence interval (approximate) for a population correlation......Page 647
Spearman’s rank correlation test......Page 648
Correlation corrected for measurement error......Page 649
PRACTICE PROBLEMS......Page 650
ASSIGNMENT PROBLEMS......Page 659
10 INTERLEAF Publication bias......Page 668
17 Regression......Page 671
EXAMPLE 17.1 The lion’s nose......Page 673
The method of least squares......Page 674
Formula for the line......Page 675
Calculating the slope and intercept......Page 676
Predicted values......Page 677
Residuals......Page 678
Confidence interval for the slope......Page 679
Confidence intervals for predictions......Page 680
Extrapolation......Page 681
EXAMPLE 17.3 Prairie Home Campion......Page 683
The t-test of regression slope......Page 684
The ANOVA approach......Page 685
Using R2 to measure the fit of the line to data......Page 686
17.4 Regression toward the mean......Page 687
Outliers......Page 689
Detecting non-normality and unequal variance......Page 691
17.6 Transformations......Page 693
17.7 The effects of measurement error on regression......Page 696
A curve with an asymptote......Page 697
Quadratic curves......Page 698
Formula-free curve fitting......Page 699
17.9 Logistic regression: fitting a binary response variable......Page 701
17.10 Summary......Page 705
Regression intercept......Page 707
Confidence interval for the predicted individual Y at a given X (prediction intervals)......Page 708
R squared (R2)......Page 709
PRACTICE PROBLEMS......Page 711
ASSIGNMENT PROBLEMS......Page 723
11 INTERLEAF Using species as data points......Page 739
Review Problems 3......Page 743
18 Multiple explanatory variables......Page 754
Modeling with linear regression......Page 756
Generalizing linear regression......Page 757
General linear models......Page 759
Analyzing data from a randomized block design......Page 761
Fitting the model to data......Page 762
18.3 Analyzing factorial designs......Page 764
EXAMPLE 18.3 Interaction zone......Page 765
Model formula......Page 766
Testing the factors......Page 767
The importance of distinguishing fixed and random factors......Page 768
EXAMPLE 18.4 Mole-rat layabouts......Page 769
Testing interaction......Page 770
Fitting a model without an interaction term......Page 771
18.5 Assumptions of general linear models......Page 773
18.6 Summary......Page 775
PRACTICE PROBLEMS......Page 777
ASSIGNMENT PROBLEMS......Page 783
19 Computer-intensive methods......Page 789
EXAMPLE 19.1 How did he know? The non-randomness of haphazard choice......Page 791
EXAMPLE 19.2 The language center in chimps’ brains......Page 795
Bootstrap standard error......Page 797
Confidence intervals by bootstrapping......Page 798
Bootstrapping with multiple groups......Page 799
Assumptions and limitations of the bootstrap......Page 800
19.3 Summary......Page 802
PRACTICE PROBLEMS......Page 803
ASSIGNMENT PROBLEMS......Page 810
20 Likelihood......Page 814
20.1 What is likelihood?......Page 816
Phylogeny estimation......Page 817
Gene mapping......Page 818
Probability model......Page 819
The likelihood formula......Page 820
The maximum likelihood estimate......Page 821
Likelihood-based confidence intervals......Page 823
Probability model......Page 825
The likelihood formula......Page 826
Bias......Page 827
Testing a population proportion......Page 829
20.6 Summary......Page 831
Log-likelihood ratio test for a single parameter......Page 832
PRACTICE PROBLEMS......Page 833
ASSIGNMENT PROBLEMS......Page 839
21 Meta-analysis combining information from multiple studies......Page 844
Why repeat a study?......Page 846
EXAMPLE 21.2 Aspirin and myocardial infarction......Page 847
EXAMPLE 21.3 The Transylvania effect......Page 849
Define the question......Page 851
Review the literature......Page 852
Compute effect sizes......Page 853
Calculate confidence intervals and test hypotheses......Page 855
Look for associations......Page 856
21.5 File-drawer problem......Page 858
21.6 How to make your paper accessible to meta-analysis......Page 859
21.7 Summary......Page 860
Mantel-Haenszel test......Page 861
PRACTICE PROBLEMS......Page 862
ASSIGNMENT PROBLEMS......Page 864
Chapter 2......Page 865
Chapter 4......Page 866
Chapter 7......Page 867
Chapter 8......Page 868
Chapter 9......Page 869
Chapter 11......Page 870
Chapter 13......Page 871
Chapter 14......Page 872
Chapter 15......Page 873
Chapter 17......Page 874
Chapter 19......Page 875
Chapter 20......Page 876
Chapter 21......Page 877
Using statistical tables......Page 878
Statistical Table A: The χ2 distribution......Page 879
Statistical Table B: The standard normal (Z) distribution......Page 883
Statistical Table C: Student’s t-distribution......Page 885
Statistical Table D: The F-distribution......Page 888
Statistical Table E: Mann-Whitney U-distribution......Page 895
Statistical Table F: Tukey-Kramer q-distribution......Page 897
Statistical Table G: Critical values for the Spearman’s rank correlation......Page 898
Literature Cited......Page 902
Chapter 1......Page 930
Chapter 2......Page 931
Chapter 3......Page 936
Chapter 4......Page 939
Chapter 5......Page 940
Chapter 6......Page 943
Chapter 7......Page 944
Chapter 8......Page 947
Chapter 9......Page 952
Review 1......Page 959
Chapter 10......Page 963
Chapter 11......Page 965
Chapter 12......Page 968
Chapter 13......Page 973
Review 2......Page 977
Chapter 14......Page 981
Chapter 15......Page 983
Chapter 16......Page 989
Chapter 17......Page 991
Review 3......Page 996
Chapter 18......Page 1001
Chapter 19......Page 1003
Chapter 20......Page 1004
Chapter 21......Page 1006
Chapter 5......Page 1008
Chapter 11......Page 1009
Chapter 16......Page 1010
Chapter 20......Page 1011
Chapter 21......Page 1012
A......Page 1013
B......Page 1014
C......Page 1016
D......Page 1017
E......Page 1018
F......Page 1019
G......Page 1020
H......Page 1021
I......Page 1022
L......Page 1023
M......Page 1024
N......Page 1027
P......Page 1028
R......Page 1031
S......Page 1033
T......Page 1035
U......Page 1036
W......Page 1037
Z......Page 1038
Inside Back Cover......Page 1039
Back Cover......Page 1040