Springer, 2006. — 271 p. — ISBN: 0387329064, 978-0387329062.
Series: Statistics and Computing.
This book shows how to look at ways of visualizing large datasets, whether large in numbers of cases, or large in numbers of variables, or large in both. All ideas are illustrated with displays from analyses of real datasets and the importance of interpreting displays effectively is emphasized. Graphics should be drawn to convey information and the book includes many insightful examples. New approaches to graphics are needed to visualize the information in large datasets and most of the innovations described in this book are developments of standard graphics. The book is accessible to readers with some experience of drawing statistical graphics.
Contents:Introduction.
Data Visualization.
Research Literature.
How Large Is a Large Dataset?
The Effects of Largeness:
Storage, Quality, Complexity, Speed, Analyses, Displays, Graphical Formats.
What Is in This Book.
Software.
What Is on the Website:
Files and Code for Figures, Links to Software, Datasets.
Contributing Authors.
Basics.
Statistical Graphics.
Plots for Categorical Data:
Barcharts and Spineplots for Univariate Categorical Data, Mosaic Plots for Multi-dimensional Categorical Data.
Plots for Continuous Data:
Dotplots, Boxplots, and Histograms; Scatterplots, Parallel Coordinates, and the Grand Tour.
Data on Mixed Scales.
Maps.
Contour Plots and Image Maps.
Time Series Plots.
Structure Plots.
Scaling Up Graphics.
Upscaling as a General Problem in Statistics.
Area Plots:
Histograms, Barcharts, Mosaic Plots.
Point Plots: Boxplots, Scatterplots, Parallel Coordinates.
From Areas to Points and Back:
α-Blending and Tonal Highlighting.
Modifying Plots.
Interacting with Graphics.
Interaction.
Interaction and Data Displays:
Querying, Selection and Linking, Selection Sequences, Varying Plot Characteristics, Interfaces and Interaction,
Degrees of Linking, Warnings and Redmarking.
Interaction and Large Datasets:
Querying; Selection, Linking, and Highlighting; Varying Plot Characteristics for Large Datasets.
New Interactive Tasks:
Subsetting, Aggregation and Recoding, Transformations, Weighting, Managing Screen Layout.
Summary and Future Directions.
Applications.
Multivariate Categorical Data — Mosaic Plots.
Area-based Displays:
Weighted Displays and Weights in Datasets.
Displays and Techniques in One Dimension:
Sorting and Reordering, Grouping, Averaging, and Zooming.
Mosaic Plots:
Combinatorics of Mosaic Plots, Cases per Pixel and Pixels per Case, Calibrating the Eye, Gray-shading,
Rescaling Binsizes, Rankings.
Rotating Plots.
Introduction:
Type of Data, Visual Methods for Continuous Variables, Scaling Up Multiple Views for Larger Datasets.
Beginning to Work with a Million Cases:
What Happens in GGobi, a Real-time System?
Reducing the Number of Cases, Density Estimation, Screen Real Estate Indexing.
Software System.
Application: Data Description, Viewing a Tour of the Data, Scatterplot Matrix.
Current and Future Developments:
Improving the Methods, Software, How Might These Tools Be Used?
Multivariate Continuous Data — Parallel Coordinates.
Interpolations and Inner Products.
Generalized Parallel Coordinate Geometry.
A New Family of Smooth Plots.
Examples: Automobile Data, Hyperspectral Data: Dealing with Massive Datasets.
Detecting Second–Order Structures.
Networks.
Layout Algorithms: Simple Tree Layout, Force Layout Methods, Individual Node Movement Algorithms.
Interactivity: Speed Considerations, Interaction and Layout.
NicheWorks.
Example: International Calling Fraud.
Languages for Description and Layouts: Defining a Graph, Graph Specification via VizML.
Trees.
Growing Trees for Large Datasets:
Scalability of the CART Growing Algorithm, Scalability of Pruning Methods, Statistical Tests and Large Datasets,
Using Trees for Large Datasets in Practice.
Visualization of Large Trees: Hierarchical Plots, Sectioned Scatterplots, Recursive Plots.
Forests for Large Datasets.
Transactions.
Introduction and Background.
Mice and Elephant Plots and Random Sampling.
Biased Sampling: Windowed Biased Sampling, Box–Cox Biased Sampling.
Quantile Window Sampling.
Commonality of Flow Rates.
Graphics of a Large Dataset.
QuickStart Guide Data Visualization for Large Datasets.
Visualizing the InfoVis 2005 Contest Dataset:
Preliminaries, Variables, First Analyses, Multivariate Displays, Grouping and Selection, Special Features,
Presenting Results.