The R Book

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

The high-level language of R is recognized as one of the most powerful and flexible statistical software environments, and is rapidly becoming the standard setting for quantitative analysis, statistics and graphics. R provides free access to unrivalled coverage and cutting-edge applications, enabling the user to apply numerous statistical methods ranging from simple regression to time series or multivariate analysis.

Building on the success of the author’s bestselling Statistics: An Introduction using R, The R Book is packed with worked examples, providing an all inclusive guide to R, ideal for novice and more accomplished users alike. The book assumes no background in statistics or computing and introduces the advantages of the R environment, detailing its applications in a wide range of disciplines.

  • Provides the first comprehensive reference manual for the R language, including practical guidance and full coverage of the graphics facilities.
  • Introduces all the statistical models covered by R, beginning with simple classical tests such as chi-square and t-test.
  • Proceeds to examine more advance methods, from regression and analysis of variance, through to generalized linear models, generalized mixed models, time series, spatial statistics, multivariate statistics and much more.

The R Book is aimed at undergraduates, postgraduates and professionals in science, engineering and medicine. It is also ideal for students and professionals in statistics, economics, geography and the social sciences.

Excerpts from Chapter 4 of The R Book

Chapter 4: Level Set Trees and Code Learn how to make a volume plot and a barycenter plot, and calculate level set trees with the algorithm LeafsFirst, which is implemented in function ``leafsfirst''. This function takes as an argument a piecewise constant function object.

The multimodal 2D example (Click on image to enlarge)

We consider the density shown in the 2D three-modal density, and calculate first a piecewise constant function object representing this function, and then calculate the level set tree.

N<-c(35,35)                      # size of the grid  pcf<-sim.data(N=N,type=''mulmod'') # piecewise constant function  lst.big<-leafsfirst(pcf)         # level set tree  
We may make the volume plot with the command ''plotvolu(lst)''. However, it is faster first to prune the level set tree, and then plot the reduced level set tree. Function ''treedisc'' takes as the first argument a level set tree, as the second argument the original piecewise constant function, and the 3rd argument ''ngrid'' gives the number of levels in the pruned level set tree. We try the number of levels ngrid=100.
lst<-treedisc(lst.big,pcf,ngrid=100)  

Now we may make a volume plot with the function ''plotvolu''.

plotvolu(lst)  

We draw barycenter plots with the function ''plotbary''.

  plotbary(lst,coordi=2)  # 2nd coordinate  

Note: We may find the number and the location of the modes with the ''modecent'' function, which takes as argument a level set tree. Function ''locofmax'' takes as argument a piecewise constant function and calculates the location of the maximum.

modecent(lst)  locofmax(pcf)  

The 3D tetrahedron example (Click on image to enlarge)

We consider the 3-dimensional example. The calculation is much more time consuming this time.

N<-c(32,32,32)                    # the size of the grid  pcf<-sim.data(N=N,type=''tetra3d'') # piecewise constant function  lst.big<-leafsfirst(pcf)             # level set tree  lst<-treedisc(lst.big,pcf,ngrid=200) # pruned level set tree    plotvolu(lst,modelabel=FALSE)        # volume plot  plotvolu(lst,cutlev=0.010,ptext=0.00045,colo=TRUE) # zooming    coordi<-1                   # coordinate, coordi = 1, 2, 3  plotbary(lst,coordi=coordi,ptext=0.0006) # barycenter plot     

This time we have used parameter ''cutlev'' to make a zoomed volume plot. When this parameter is given, then only the part of the level set tree is shown which is above the value ''cutlev''. Typically it is better to zoom in to the volume plot by cutting the tails of the volume function away. This is achieved by the parameter ''xlim''. We may us for example the following command to make a ``vertically zoomed'' volume plot.

plotvolu(lst,xlim=c(140,220),ptext=0.00045,           colo=TRUE,modelabel=FALSE)   

Additional parameters which we have used are the ''modelabel'', which is used to suppress the plotting of the mode labels, ''ptext'', which lifts the mode labels with the given amount, and ''colo'', which colors the graph of the volume function to make a comparison with the barycenter plots easier.

The 4D pentahedron example (Click on image to enlarge)

We consider the 4-dimensional example.

N<-c(16,16,16,16)    pcf<-sim.data(N=N,type=''penta4d'')  lst.big<-leafsfirst(pcf)  lst<-treedisc(lst.big,pcf,ngrid=100)    plotvolu(lst,modelabel=F)  # volume plot  plotvolu(lst,cutlev=0.0008,ptext=0.00039,colo=TRUE) # zooming    coordi<-1               # coordinate, coordi = 1, 2, 3, 4  plotbary(lst,coordi=coordi,ptext=0.0003) # barycenter plot  

Author(s): Michael J. Crawley
Edition: 1
Publisher: Wiley
Year: 2007

Language: English
Pages: 949

Contents......Page 5
Preface......Page 6
Acknowledgements......Page 7
Running R......Page 8
Getting Help in R......Page 9
Worked Examples of Functions......Page 10
Contents of Libraries......Page 11
Data Editor......Page 12
Significance Stars......Page 13
Linking to Other Computer Languages......Page 14
Tidying Up......Page 15
Screen prompt......Page 16
Built-in Functions......Page 17
Modulo and Integer Quotients......Page 18
Rounding......Page 19
Infinity and Things that Are Not a Number (NaN)......Page 20
Missing values NA......Page 21
Creating a Vector......Page 22
Named Elements within Vectors......Page 23
Vector Functions......Page 24
Using with rather than attach......Page 25
Subscripts and Indices......Page 27
Working with Vectors and Logical Subscripts......Page 28
Finding Closest Values......Page 30
Trimming Vectors Using Negative Subscripts......Page 31
Logical Arithmetic......Page 32
Evaluation of combinations of TRUE and FALSE......Page 33
Repeats......Page 34
Generate Factor Levels......Page 35
Generating Regular Sequences of Numbers......Page 36
Sorting, Ranking and Ordering......Page 37
The sample Function......Page 39
Matrices......Page 40
Arrays......Page 47
Character Strings......Page 50
Writing functions in R......Page 54
Variance......Page 58
Degrees of freedom......Page 59
Variance Ratio Test......Page 60
Using Variance......Page 61
Error Bars......Page 63
Loops and Repeats......Page 65
The switch Function......Page 70
Optional Arguments......Page 71
Variable Numbers of Arguments......Page 72
Returning Values from a Function......Page 73
Flexible Handling of Arguments to Functions......Page 74
Evaluating Functions with apply, sapply and lapply......Page 75
Looking for runs of numbers within vectors......Page 81
Saving Data Produced within R to Disc......Page 83
Testing for Equality......Page 84
Sets: union, intersect and setdiff......Page 85
Pattern Matching......Page 86
Testing and Coercing in R......Page 94
Dates and Times in R......Page 96
The scan Function......Page 104
Common Errors when Using read.table......Page 105
Separators and Decimal Points......Page 106
Checking Files from the Command Line......Page 108
Reading Data from Files with Non-standard Formats Using scan......Page 109
The readLines Function......Page 111
4 Dataframes......Page 114
Subscripts and Indices......Page 118
Sorting Dataframes......Page 120
Using Logical Conditions to Select Rows from the Dataframe......Page 123
Omitting Rows Containing Missing Values, NA......Page 126
Complex Ordering with Mixed Directions......Page 128
Creating a Dataframe from Another Kind of Object......Page 130
Eliminating Duplicate Rows from a Dataframe......Page 132
Dates in Dataframes......Page 133
Using the match Function in Dataframes......Page 134
Merging Two Dataframes......Page 136
Adding Margins to a Dataframe......Page 137
Summarizing the Contents of Dataframes......Page 139
Plots with Two Variables......Page 142
Plots for Single Samples......Page 168
Plots with multiple variables......Page 176
Special Plots......Page 180
Summary......Page 188
Summary Tables......Page 190
Tables of Counts......Page 194
Expanding a Table into a Dataframe......Page 195
Converting from a Dataframe to a Table......Page 196
Calculating tables of proportions......Page 197
The scale function......Page 198
The model.matrix function......Page 199
Mathematical Functions......Page 202
Continuous Probability Distributions......Page 215
Discrete probability distributions......Page 249
Matrix Algebra......Page 265
Calculus......Page 281
Differential equations......Page 282
Single Samples......Page 286
Two samples......Page 296
9 Statistical Modelling......Page 330
Maximum Likelihood......Page 331
Types of Statistical Model......Page 332
Steps Involved in Model Simplification......Page 334
Model Formulae in R......Page 336
Box–Cox Transformations......Page 343
Model checking......Page 346
Summary of Statistical Models in R......Page 356
Optional arguments in model-fitting functions......Page 357
Dataframes containing the same variable names......Page 359
Akaike’s Information Criterion......Page 360
Leverage......Page 361
Misspecified Model......Page 363
Model checking in R......Page 364
Contrasts......Page 375
10 Regression......Page 394
Linear Regression......Page 395
Polynomial Approximations to Elementary Functions......Page 410
Polynomial Regression......Page 411
Fitting a Mechanistic Model to Data......Page 414
Linear Regression after Transformation......Page 416
Prediction following Regression......Page 419
Testing for Lack of Fit in a Regression with Replicated Data at Each Level of x......Page 422
Bootstrap with Regression......Page 425
Jackknife with regression......Page 428
Jackknife after Bootstrap......Page 430
Serial correlation in the residuals......Page 431
Piecewise Regression......Page 432
Robust Fitting of Linear Models......Page 437
Model Simplification......Page 440
The Multiple Regression Model......Page 441
One-Way ANOVA......Page 456
Factorial Experiments......Page 473
Pseudoreplication: Nested Designs and Split Plots......Page 476
ANOVA with aov or lm......Page 486
Effect Sizes......Page 487
Multiple Comparisons......Page 489
Projections of Models......Page 493
Multivariate Analysis of Variance......Page 494
12 Analysis of Covariance......Page 496
Analysis of Covariance in R......Page 497
A More Complex ANCOVA: Two Factors and One Continuous Covariate......Page 507
Contrasts and the Parameters of ANCOVA Models......Page 511
Order matters in summary.aov......Page 514
13 Generalized Linear Models......Page 518
Error Structure......Page 519
Link Function......Page 520
Proportion Data and Binomial Errors......Page 521
Count Data and Poisson Errors......Page 522
Quasi-likelihood......Page 523
Offsets......Page 525
Residuals......Page 527
Misspecified Link Function......Page 528
Overdispersion......Page 529
Bootstrapping a GLM......Page 530
A Regression with Poisson Errors......Page 534
Analysis of Deviance with Count Data......Page 536
Analysis of Covariance with Count Data......Page 541
Frequency Distributions......Page 543
Overdispersion in Log-linear Models......Page 547
Negative binomial errors......Page 550
Use of lmer with Complex Nesting......Page 553
A Two-Class Table of Counts......Page 556
A Four-Class Table of Counts......Page 557
Two-by-Two Contingency Tables......Page 558
Using Log-linear Models for Simple Contingency Tables......Page 559
The Danger of Contingency Tables......Page 560
Quasi-Poisson and Negative Binomial Models Compared......Page 563
A Contingency Table of Intermediate Complexity......Page 565
Schoener’s Lizards: A Complex Contingency Table......Page 567
Plot Methods for Contingency Tables......Page 571
16 Proportion Data......Page 576
Count Data on Proportions......Page 577
Odds......Page 578
Overdispersion and Hypothesis Testing......Page 580
Applications......Page 581
Converting Complex Contingency Tables to Proportions......Page 591
Analysing Schoener’s Lizards as Proportion Data......Page 593
Generalized mixed models lmer with proportion data......Page 597
17 Binary Response Variables......Page 600
Incidence functions......Page 602
Graphical Tests of the Fit of the Logistic to Data......Page 603
ANCOVA with a Binary Response Variable......Page 605
Binary Response with Pseudoreplication......Page 611
18 Generalized Additive Models......Page 618
Non-parametric Smoothers......Page 619
Generalized Additive Models......Page 621
An example with strongly humped data......Page 627
Generalized Additive Models with Binary Data......Page 630
Three-Dimensional Graphic Output from gam......Page 632
19 Mixed-Effects Models......Page 634
Replication and Pseudoreplication......Page 636
The lme and lmer Functions......Page 637
Best Linear Unbiased Predictors......Page 638
A Designed Experiment with Different Spatial Scales: Split Plots......Page 639
Hierarchical Sampling and Variance Components Analysis......Page 645
Model Simplification in Hierarchical Sampling......Page 647
Mixed-Effects Models with Temporal Pseudoreplication......Page 648
Time Series Analysis in Mixed-Effects Models......Page 652
Random Effects in Designed Experiments......Page 655
Regression in Mixed-Effects Models......Page 657
Generalized Linear Mixed Models......Page 662
Fixed Effects in Hierarchical Sampling......Page 663
Error Plots from a Hierarchical Analysis......Page 664
20 Non-linear Regression......Page 668
Comparing Michaelis–Menten and Asymptotic Exponential......Page 671
Generalized Additive Models......Page 672
Grouped Data for Non-linear Estimation......Page 674
Non-linear Time Series Models (Temporal Pseudoreplication)......Page 678
Self-starting Functions......Page 681
Self-starting four-parameter logistic......Page 685
Bootstrapping a Family of Non-linear Regressions......Page 688
21 Tree Models......Page 692
Background......Page 693
Regression Trees......Page 695
Classification trees with categorical explanatory variables......Page 700
Classification trees for replicated data......Page 702
Testing for the existence of humps......Page 705
Nicholson’s Blowflies......Page 708
Moving Average......Page 715
Seasonal Data......Page 716
Built-in Time Series Functions......Page 721
Testing for a Trend in the Time Series......Page 722
Spectral Analysis......Page 724
Multiple Time Series......Page 725
Simulated Time Series......Page 729
Time Series Models......Page 733
Time series modelling on the Canadian lynx data......Page 734
Principal Components Analysis......Page 738
Factor Analysis......Page 742
Cluster Analysis......Page 745
Neural Networks......Page 754
Point Processes......Page 756
Nearest Neighbours......Page 757
Tests for Spatial Randomness......Page 761
Libraries for spatial statistics......Page 769
Geostatistical data......Page 781
Regression Models with Spatially Correlated Errors: Generalized Least Squares......Page 785
A Monte Carlo Experiment......Page 794
Background......Page 797
The Exponential Distribution......Page 799
Kaplan–Meier Survival Distributions......Page 800
Age-Specific Hazard Models......Page 801
Survival analysis in R......Page 802
Parametric analysis......Page 804
Cox’s Proportional Hazards......Page 806
Models with Censoring......Page 808
Temporal Dynamics: Chaotic Dynamics in Population Size......Page 818
Temporal and Spatial Dynamics: a Simulated Random Walk in Two Dimensions......Page 821
Spatial Simulation Models......Page 823
Pattern Generation Resulting from Dynamic Interactions......Page 829
Graphs for Publication......Page 834
Shading......Page 835
Logarithmic Axes......Page 837
Different font families for text......Page 838
Mathematical Symbols on Plots......Page 839
Phase Planes......Page 840
Fat Arrows......Page 842
Trellis Plots......Page 843
Three-Dimensional Plots......Page 849
An Alphabetical Tour of the Graphics Parameters......Page 854
References and Further Reading......Page 880
Index......Page 884