Spatial Data Science introduces fundamental aspects of spatial data that every data scientist should know before they start working with spatial data. These aspects include how geometries are represented, coordinate reference systems (projections, datums), the fact that the Earth is round and its consequences for analysis, and how attributes of geometries can relate to geometries. In the second part of the book, these concepts are illustrated with data science examples using the R language. In the third part, statistical modelling approaches are demonstrated using real world data examples. After reading this book, the reader will be well equipped to avoid a number of major spatial data analysis errors.
The book gives a detailed explanation of the core spatial software packages for R: sf for simple feature access, and stars for raster and vector data cubes – array data with spatial and temporal dimensions. It also shows how geometrical operations change when going from a flat space to the surface of a sphere, which is what sf and stars use when coordinates are not projected (degrees longitude/latitude). Separate chapters detail a variety of plotting approaches for spatial maps using R, and different ways of handling very large vector or raster (imagery) datasets, locally, in databases, or in the cloud. The data used and all code examples are freely available online. The solutions to the exercises can be found at the site.
Data Science is concerned with finding answers to questions on the basis of available data, and communicating that effort. Besides showing the results, this communication involves sharing the data used, but also exposing the path that led to the answers in a comprehensive and reproducible way. It also acknowledges the fact that available data may not be sufficient to answer questions, and that any answers are conditional on the data collection or sampling protocols employed.
This book introduces and explains the concepts underlying spatial dаta: points, lines, polygons, rasters, coverages, geometry attributes, data cubes, reference systems, as well as higher-level concepts including how attributes relate to geometries and how this affects analysis. The relationship of attributes to geometries is known as support, and changing support also changes the characteristics of attributes. Some data generation processes are continuous in space, and may be observed everywhere. Others are discrete, observed in tesselated containers. In modern spatial data analysis, tesellated methods are often used for all data, extending across the legacy partition into point process, geostatistical and lattice models. It is support (and the understanding of support) that underlies the importance of spatial representation. The book aims at data scientists who want to get a grip on using spatial data in their analysis. To exemplify how to do things, it uses R. In future editions we hope to extend this with examples using Python and Julia.
Author(s): Edzer Pebesma, Roger Bivand
Series: The R Series
Publisher: CRC Press
Year: 2023
Language: English
Pages: 315
Cover
Half Title
Series Page
Title Page
Copyright Page
Table of contents
Preface
I. Spatial Data
1. Getting Started
1.1. A first map
1.2. Coordinate reference systems
1.3. Raster and vector data
1.4. Raster types
1.5. Time series, arrays, data cubes
1.6. Support
1.7. Spatial data science software
1.7.1. GDAL
1.7.2. PROJ
1.7.3. GEOS and s2geometry
1.7.4. NetCDF, udunits2, liblwgeom
1.8. Exercises
2. Coordinates
2.1. Quantities, units, datum
2.2. Ellipsoidal coordinates
2.2.1. Spherical or ellipsoidal coordinates
2.2.2. Projected coordinates, distances
2.2.3. Bounded and unbounded spaces
2.3. Coordinate reference systems
2.4. PROJ and mapping accuracy
2.5. WKT-2
2.6. Exercises
3. Geometries
3.1. Simple feature geometries
3.1.1. The big seven
3.1.2. Simple and valid geometries, ring direction
3.1.3. Z and M coordinates
3.1.4. Empty geometries
3.1.5. Ten further geometry types
3.1.6. Text and binary encodings
3.2. Operations on geometries
3.2.1. Unary predicates
3.2.2. Binary predicates and DE-9IM
3.2.3. Unary measures
3.2.4. Binary measures
3.2.5. Unary transformers
3.2.6. Binary transformers
3.2.7. N-ary transformers
3.3. Precision
3.4. Coverages: tessellations and rasters
3.4.1. Topological models
3.4.2. Raster tessellations
3.5. Networks
3.6. Exercises
4. Spherical Geometries
4.1. Straight lines
4.2. Ring direction and full polygon
4.3. Bounding box, rectangle, and cap
4.4. Validity on the sphere
4.5. Exercises
5. Attributes and Support
5.1. Attribute-geometry relationships and support
5.2. Aggregating and summarising
5.3. Area-weighted interpolation
5.3.1. Spatially extensive and intensive variables
5.3.2. Dasymetric mapping
5.3.3. Support in file formats
5.4. Up- and Downscaling
5.5. Exercises
6. Data Cubes
6.1. A four-dimensional data cube
6.2. Dimensions, attributes, and support
6.2.1. Regular dimensions, GDAL’s geotransform
6.2.2. Support along cube dimensions
6.3. Operations on data cubes
6.3.1. Slicing a cube: filter
6.3.2. Applying functions to dimensions
6.3.3. Reducing dimensions
6.4. Aggregating raster to vector cubes
6.5. Switching dimension with attributes
6.6. Other dynamic spatial data
6.7. Exercises
II. R for Spatial Data Science
7. Introduction to sf and stars
7.1. Package sf
7.1.1. Creation
7.1.2. Reading and writing
7.1.3. Subsetting
7.1.4. Binary predicates
7.1.5. tidyverse
7.2. Spatial joins
7.2.1. Sampling, gridding, interpolating
7.3. Ellipsoidal coordinates
7.4. Package stars
7.4.1. Reading and writing raster data
7.4.2. Subsetting stars data cubes
7.4.3. Cropping
7.4.4. Redimensioning and combining stars objects
7.4.5. Extracting point samples, aggregating
7.4.6. Predictive models
7.4.7. Plotting raster data
7.4.8. Analysing raster data
7.4.9. Curvilinear rasters
7.4.10. GDAL utils
7.5. Vector data cube examples
7.5.1. Example: aggregating air quality time series
7.5.2. Example: Bristol origin-destination data cube
7.5.3. Tidy array data
7.5.4. File formats for vector data cubes
7.6. Raster-to-vector, vector-to-raster
7.6.1. Vector-to-raster
7.7. Coordinate transformations and conversions
7.7.1. st_crs
7.7.2. st_transform, sf_project
7.7.3. sf_proj_info
7.7.4. Datum grids, proj.db, cdn.proj.org, local cache
7.7.5. Transformation pipelines
7.7.6. Axis order and direction
7.8. Transforming and warping rasters
7.9. Exercises
8. Plotting spatial data
8.1. Every plot is a projection
8.1.1. What is a good projection for my data?
8.2. Plotting points, lines, polygons, grid cells
8.2.1. Colours
8.2.2. Colour breaks: classInt
8.2.3. Graticule and other navigation aids
8.3. Base plot
8.3.1. Adding to plots with legends
8.3.2. Projections in base plots
8.3.3. Colours and colour breaks
8.4. Maps with ggplot2
8.5. Maps with tmap
8.6. Interactive maps: leaflet, mapview, tmap
8.7. Exercises
9. Large data and cloud native
9.1. Vector data: sf
9.1.1. Reading from local disk
9.1.2. Reading from databases, dbplyr
9.1.3. Reading from online resources or web services
9.1.4. APIs, OpenStreetMap
9.1.5. GeoParquet and GeoArrow
9.2. Raster data: stars
9.2.1. stars proxy objects
9.2.2. Operations on proxy objects
9.2.3. Remote raster resources
9.3. Very large data cubes
9.3.1. Finding and processing assets
9.3.2. Cloud native storage: Zarr
9.3.3. APIs for data: GEE, openEO
9.4. Exercises
III. Models for Spatial Data
10. Statistical modelling of spatial data
10.1. Mapping with non-spatial regression and ML models
10.2. Support and statistical modelling
10.3. Time in predictive models
10.4. Design-based and model-based inference
10.5. Predictive models with coordinates
10.6. Exercises
11. Point Pattern Analysis
11.1. Observation window
11.2. Coordinate reference systems
11.3. Marked point patterns, points on linear networks
11.4. Spatial sampling and simulating a point process
11.5. Simulating points on the sphere
11.6. Exercises
12. Spatial Interpolation
12.1. A first dataset
12.2. Sample variogram
12.3. Fitting variogram models
12.4. Kriging interpolation
12.5. Areal means: block kriging
12.6. Conditional simulation
12.7. Trend models
12.7.1. A population grid
12.8. Exercises
13. Multivariate and Spatiotemporal Geostatistics
13.1. Preparing the air quality dataset
13.2. Multivariable geostatistics
13.3. Spatiotemporal geostatistics
13.3.1. A spatiotemporal variogram model
13.3.2. Irregular space time data
13.4. Exercises
14. Proximity and Areal Data
14.1. Representing proximity in spdep
14.2. Contiguous neighbours
14.3. Graph-based neighbours
14.4. Distance-based neighbours
14.5. Weights specification
14.6. Higher order neighbours
14.7. Exercises
15. Measures of Spatial Autocorrelation
15.1. Measures and process misspecification
15.2. Global measures
15.2.1. Join-count tests for categorical data
15.2.2. Moran’s I
15.3. Local measures
15.3.1. Local Moran’s Ii
15.3.2. Local Getis-Ord Gi
15.3.3. Local Geary’s Ci
15.3.4. The rgeoda package
15.4. Exercises
16. Spatial Regression
16.1. Markov random field and multilevel models
16.1.1. Boston house value dataset
16.2. Multilevel models of the Boston dataset
16.2.1. IID random effects with lme4
16.2.2. IID and CAR random effects with hglm
16.2.3. IID and ICAR random effects with R2BayesX
16.2.4. IID, ICAR and Leroux random effects with INLA
16.2.5. ICAR random effects with mgcv::gam()
16.2.6. Upper-level random effects: summary
16.3. Exercises
17. Spatial Econometrics Models
17.1. Spatial econometric models: definitions
17.2. Maximum likelihood estimation in spatialreg
17.2.1. Boston house value dataset examples
17.3. Impacts
17.4. Predictions
17.5. Exercises
A. Older R Spatial Packages
A.1. Retiring rgdal and rgeos
A.2. Links and differences between sf and sp
A.3. Migration code and packages
A.4. Package raster and terra
B. R Basics
B.1. Pipes
B.2. Data structures
B.2.1. Homogeneous vectors
B.2.2. Heterogeneous vectors: list
B.2.3. NULL and removing list elements
B.2.4. Attributes
B.2.5. The names attributes
B.2.6. Using structure
B.3. Dissecting a MULTIPOLYGON
References
Index
Index of functions