Scientific research often starts with data collection. However, many researchers pay insufficient attention to this first step in their research. The author, researcher at Wageningen University and Research, often had to conclude that the data collected by fellow researchers were suboptimal, or in some cases even unsuitable for their aim. One reason is that sampling is frequently overlooked in statistics courses. Another reason is the lack of practical textbooks on sampling. Numerous books have been published on the statistical analysis and modelling of data using R, but to date no book has been published in this series on how these data can best be collected. This book fills this gap. Spatial Sampling with R presents an overview of sampling designs for spatial sample survey and monitoring. It shows how to implement the sampling designs and how to estimate (sub)population- and space-time parameters in R.
Key features
Describes classical, basic sampling designs for spatial survey, as well as recently developed, advanced sampling designs and estimators
Presents probability sampling designs for estimating parameters for a (sub)population, as well as non-probability sampling designs for mapping
Gives comprehensive overview of model-assisted estimators
Illustrates sampling designs with surveys of soil organic carbon, above-ground biomass, air temperature, opium poppy
Explains integration of wall-to-wall data sets (e.g. remote sensing images) and sample data
Data and R code available on github
Exercises added making the book suitable as a textbook for students
The target group of this book are researchers and practitioners of sample surveys, as well as students in environmental, ecological, agricultural science or any other science in which knowledge about a population of interest is collected through spatial sampling. This book helps to implement proper sampling designs, tailored to their problems at hand, so that valuable data are collected that can be used to answer the research questions.
Author(s): Dick J. Brus
Series: The R Series
Publisher: CRC Press/Chapman & Hall
Year: 2022
Language: English
Pages: 548
City: Boca Raton
Cover
Half Title
Series Page
Title Page
Copyright Page
Contents
Preface
1. Introduction
1.1. Basic sampling concepts
1.1.1. Population parameters
1.1.2. Descriptive statistics vs. inference about a population
1.1.3. Random sampling vs. probability sampling
1.2. Design-based vs. model-based approach
1.3. Populations used in sampling experiments
1.3.1. Soil organic matter in Voorst, the Netherlands
1.3.2. Poppy fields in Kandahar, Afghanistan
1.3.3. Aboveground biomass in Eastern Amazonia, Brazil
1.3.4. Annual mean air temperature in Iberia
I. Probability sampling for estimating population parameters
2. Introduction to probability sampling
2.1. Horvitz-Thompson estimator
2.2. Hansen-Hurwitz estimator
2.3. Using models in design-based approach
3. Simple random sampling
3.1. Estimation of population parameters
3.1.1. Population proportion
3.1.2. Cumulative distribution function and quantiles
3.2. Sampling variance of estimator of population parameters
3.3. Confidence interval estimate
3.3.1. Confidence interval for a proportion
3.4. Simple random sampling of circular plots
3.4.1. Sampling from a finite set of fixed circles
3.4.2. Sampling from an infinite set of floating circles
4. Stratified simple random sampling
4.1. Estimation of population parameters
4.1.1. Population proportion, cumulative distribution function, and quantiles
4.1.2. Why should we stratify?
4.2. Confidence interval estimate
4.3. Allocation of sample size to strata
4.4. Cum-root-f stratification
4.5. Stratification with multiple covariates
4.6. Geographical stratification
4.7. Multiway stratification
4.8. Multivariate stratification
5. Systematic random sampling
5.1. Estimation of population parameters
5.2. Approximating the sampling variance of the estimator of the mean
6. Cluster random sampling
6.1. Estimation of population parameters
6.2. Clusters selected with probabilities proportional to size, without replacement
6.3. Simple random sampling of clusters
6.4. Stratified cluster random sampling
7. Two-stage cluster random sampling
7.1. Estimation of population parameters
7.2. Primary sampling units selected without replacement
7.3. Simple random sampling of primary sampling units
7.4. Stratified two-stage cluster random sampling
8. Sampling with probabilities proportional to size
8.1. Probability-proportional-to-size sampling with replacement
8.2. Probability-proportional-to-size sampling without replacement
8.2.1. Systematic pps sampling without replacement
8.2.2. The pivotal method
9. Balanced and well-spread sampling
9.1. Balanced sampling
9.1.1. Balanced sample vs. balanced sampling design
9.1.2. Unequal inclusion probabilities
9.1.3. Stratified random sampling
9.1.4. Multiway stratification
9.2. Well-spread sampling
9.2.1. Local pivotal method
9.2.2. Generalised random-tessellation stratified sampling
9.3. Balanced sampling with spreading
10. Model-assisted estimation
10.1. Generalised regression estimator
10.1.1. Simple and multiple regression estimators
10.1.2. Penalised least squares estimation
10.1.3. Regression estimator with stratified simple random sampling
10.1.3.1. Combined regression estimator
10.2. Ratio estimator
10.2.1. Ratio estimators with stratified simple random sampling
10.2.2. Poststratified estimator
10.3. Model-assisted estimation using machine learning techniques
10.3.1. Predicting with a regression tree
10.3.2. Predicting with a random forest
10.4. Big data and volunteer data
11. Two-phase random sampling
11.1. Two-phase random sampling for stratification
11.2. Two-phase random sampling for regression
12. Computing the required sample size
12.1. Standard error
12.2. Length of confidence interval
12.2.1. Length of confidence interval for a proportion
12.3. Statistical testing of hypothesis
12.3.1. Sample size for testing a proportion
12.4. Accounting for design effect
12.5. Bayesian sample size determination
12.5.1. Bayesian criteria for sample size computation
12.5.1.1. Average length criterion
12.5.1.2. Average coverage criterion
12.5.1.3. Worst outcome criterion
12.5.2. Mixed Bayesian-likelihood approach
12.5.3. Estimation of population mean
12.5.4. Estimation of a population proportion
13. Model-based optimisation of probability sampling designs
13.1. Model-based optimisation of sampling design type and sample size
13.1.1. Analytical approach
13.1.1.1. Bulking soil aliquots into a composite sample
13.1.2. Geostatistical simulation approach
13.1.3. Bayesian approach
13.2. Model-based optimisation of spatial strata
14. Sampling for estimating parameters of domains
14.1. Direct estimator for large domains
14.2. Model-assisted estimators for small domains
14.2.1. Regression estimator
14.2.2. Synthetic estimator
14.3. Model-based prediction
14.3.1. Random intercept model
14.3.2. Geostatistical model
14.4. Supplemental probability sampling of small domains
15. Repeated sample surveys for monitoring population parameters
15.1. Space-time designs
15.2. Space-time population parameters
15.3. Design-based generalised least squares estimation of spatial means
15.3.1. Current mean
15.3.2. Change of the spatial mean
15.3.3. Temporal trend of the spatial mean
15.3.4. Space-time mean
15.4. Case study: annual mean daily temperature in Iberia
15.4.1. Static-synchronous design
15.4.2. Independent synchronous design
15.4.3. Serially alternating design
15.4.4. Supplemented panel design
15.4.5. Rotating panel design
15.4.6. Sampling experiment
15.5. Space-time sampling with stratified random sampling in space
II. Sampling for mapping
16. Introduction to sampling for mapping
16.1. When is probability sampling not required?
16.2. Sampling for simultaneously mapping and estimating means
16.3. Broad overview of sampling designs for mapping
17. Regular grid and spatial coverage sampling
17.1. Regular grid sampling
17.2. Spatial coverage sampling
17.3. Spatial infill sampling
18. Covariate space coverage sampling
18.1. Covariate space infill sampling
18.2. Performance of covariate space coverage sampling in random forest prediction
19. Conditioned Latin hypercube sampling
19.1. Conditioned Latin hypercube infill sampling
19.2. Performance of conditioned Latin hypercube sampling in random forest prediction
20. Spatial response surface sampling
20.1. Increasing the sample size
20.2. Stratified spatial response surface sampling
20.3. Mapping
21. Introduction to kriging
21.1. Ordinary kriging
21.2. Block-kriging
21.3. Kriging with an external drift
21.4. Estimating the semivariogram
21.4.1. Method-of-moments
21.4.2. Maximum likelihood
21.5. Estimating the residual semivariogram
21.5.1. Iterative method-of-moments
21.5.2. Restricted maximum likelihood
22. Model-based optimisation of the grid spacing
22.1. Optimal grid spacing for ordinary kriging
22.2. Controlling the mean or a quantile of the ordinary kriging variance
22.3. Optimal grid spacing for block-kriging
22.4. Optimal grid spacing for kriging with an external drift
22.5. Bayesian approach
23. Model-based optimisation of the sampling pattern
23.1. Spatial simulated annealing
23.2. Optimising the sampling pattern for ordinary kriging
23.3. Optimising the sampling pattern for kriging with an external drift
23.4. Model-based infill sampling for ordinary kriging
23.5. Model-based infill sampling for kriging with an external drift
24. Sampling for estimating the semivariogram
24.1. Nested sampling
24.2. Independent sampling of pairs of points
24.3. Optimisation of sampling pattern for semivariogram estimation
24.3.1. Uncertainty about semivariogram parameters
24.3.2. Uncertainty about the kriging variance
24.4. Optimisation of sampling pattern for semivariogram estimation and mapping
24.5. A practical solution
25. Sampling for validation of maps
25.1. Map quality indices
25.1.1. Estimation of map quality indices
25.2. Real-world case study
25.2.1. Estimation of the population mean error and mean squared error
25.2.2. Estimation of the standard error of the estimator of the population mean error and mean squared error
25.2.3. Estimation of model efficiency coefficient
25.2.4. Statistical testing of hypothesis about population ME and MSE
26. Design-based, model-based, and model-assisted approach for sampling and inference
26.1. Two sources of randomness
26.2. Identically and independently distributed
26.3. Bias and variance
26.4. Effective sample size
26.5. Exploiting spatial structure in design-based approach
26.6. Model-assisted vs. model-dependent
A. Answers to exercises
Bibliography
Index