This richly illustrated book describes the use of interactive and dynamic graphics as part of multidimensional data analysis. Chapter topics include clustering, supervised classification, and working with missing values. A variety of plots and interaction methods are used in each analysis, often starting with brushing linked low-dimensional views and working up to manual manipulation of tours of several variables. The book is augmented by a wealth of online material.
Author(s): Dianne Cook, Deborah F. Swayne, A. Buja, D. Temple Lang, H. Hofmann, H. Wickham, M. Lawrence
Series: Use R
Edition: 1
Publisher: Springer
Year: 2007
Language: English
Pages: 208
Technical Notes......Page 9
1.1 Data visualization: beyond the third dimension......Page 14
1.2 Statistical data visualization: goals and history......Page 16
1.3 Getting down to data......Page 17
1.4 Getting real: process and caveats......Page 21
1.5 Interactive investigation......Page 28
2.1 Introduction......Page 30
Real-valued variables......Page 32
Real-valued variables......Page 34
Categorical variables......Page 36
Parallel coordinate plots for categorical or real-valued variables......Page 37
Tours for real-valued variables......Page 39
2.2.4 Plot arrangement......Page 47
2.3.1 Brushing......Page 48
Brushing as database query......Page 49
Linking mechanisms......Page 50
Persistent vs. transient — painting vs. brushing......Page 52
2.3.3 Scaling......Page 54
2.3.4 Subset selection......Page 55
2.3.7 Dragging points......Page 56
2.4 Tools available elsewhere......Page 57
Exercises......Page 58
3 Missing Values......Page 59
3.1 Background......Page 60
3.2.1 Shadow matrix......Page 61
3.2.2 Getting started: missings in the “margins”......Page 64
3.2.3 A limitation......Page 65
3.3 Imputation......Page 67
3.3.2 Random values......Page 68
3.3.3 Multiple imputation......Page 70
3.4 Recap......Page 73
Exercises......Page 74
4 Supervised Classification......Page 75
4.1 Background......Page 76
4.1.1 Classical multivariate statistics......Page 77
4.1.2 Data mining......Page 78
4.1.3 Studying the fit......Page 81
4.2.1 Overview of Italian Olive Oils......Page 82
4.2.2 Building classifiers to predict region......Page 83
4.2.3 Separating the oils by area within each region......Page 85
4.3.1 Linear discriminant analysis......Page 89
4.3.2 Trees......Page 93
4.3.3 Random forests......Page 95
4.3.4 Neural networks......Page 100
4.3.5 Support vector machine......Page 104
4.3.6 Examining boundaries......Page 109
Exercises......Page 111
5 Cluster Analysis......Page 114
5.1 Background......Page 116
5.2 Purely graphics......Page 118
5.3.1 Hierarchical algorithms......Page 122
5.3.2 Model-based clustering......Page 124
5.3.3 Self-organizing maps......Page 130
5.3.4 Comparing methods......Page 133
5.4 Characterizing clusters......Page 136
5.5 Recap......Page 137
Exercises......Page 138
6.1 Inference......Page 140
6.2 Longitudinal data......Page 145
6.3 Network data......Page 150
6.4 Multidimensional scaling......Page 156
Exercises......Page 162
7.1 Tips......Page 164
7.2 Australian Crabs......Page 165
7.3 Italian Olive Oils......Page 166
7.5 PRIM7......Page 168
7.6 Tropical Atmosphere-Ocean Array (TAO)......Page 170
7.7 Primary Biliary Cirrhosis (PBC)......Page 172
7.8 Spam......Page 173
7.9 Wages......Page 175
7.10 Rat Gene Expression......Page 177
7.11 Arabidopsis Gene Expression......Page 179
7.12 Music......Page 182
7.14 Adjacent Transposition Graph......Page 183
7.15 Florentine Families......Page 184
7.16 Morse Code Confusion Rates......Page 185
7.17 Personal Social Network......Page 186
References......Page 187