Correspondence Analysis in Practice

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Drawing on the author’s 45 years of experience in multivariate analysis, Correspondence Analysis in Practice, Third Edition, shows how the versatile method of correspondence analysis (CA) can be used for data visualization in a wide variety of situations. CA and its variants, subset CA, multiple CA and joint CA, translate two-way and multi-way tables into more readable graphical forms ― ideal for applications in the social, environmental and health sciences, as well as marketing, economics, linguistics, archaeology, and more.

Michael Greenacre is Professor of Statistics at the Universitat Pompeu Fabra, Barcelona, Spain, where he teaches a course, amongst others, on Data Visualization. He has authored and co-edited nine books and 80 journal articles and book chapters, mostly on correspondence analysis, the latest being Visualization and Verbalization of Data in 2015. He has given short courses in fifteen countries to environmental scientists, sociologists, data scientists and marketing professionals, and has specialized in statistics in ecology and social science.

Author(s): Michael Greenacre
Series: Chapman & Hall/CRC Interdisciplinary Statistics
Edition: 3
Publisher: Chapman and Hall / CRC
Year: 2016

Language: English
Pages: 326
Tags: Probability & Statistics;Applied;Mathematics;Science & Math;Statistics;Mathematics;Science & Mathematics;New, Used & Rental Textbooks;Specialty Boutique

Cover ... 1
Half Title ... 2
Title Page ... 6
Copyright Page ... 7
Dedication ... 8
Contents ... 10
Preface ... 12
1 Scatterplots and Maps ... 18
Contents ... 18
Continuous variables ... 19
Expressing data in relative amounts ... 19
Categorical variables ... 19
Ordering of categories ... 20
Distances between categories ... 20
Distance interpretation of scatterplots ... 20
Scatterplots as maps ... 20
Calibration of a direction in the map ... 21
Information- transforming nature of the display ... 21
Nominal and ordinal variables ... 22
Plotting more than one set of data ... 22
Interpreting absolute or relative frequencies ... 23
Describing and interpreting data, vs. modelling and statistical inference ... 24
Large data sets ... 24
SUMMARY: Scatterplots and maps ... 25
2 Profiles and the Profile Space ... 26
Contents ... 26
Average pro?le ... 27
Row pro?les and column pro?les ... 27
Symmetric treatment of rows and columns ... 28
Asymmetric consideration of the data table ... 28
Plotting the pro?les in the pro?le space ... 28
Vertex points de?ne the extremes of the pro?le space ... 29
Triangular (or ternary) coordinate system ... 29
Positioning a point in a triangular coordinate system ... 31
Geometry of pro?les with more than three elements ... 31
Data on a ratio scale ... 32
Data on a common scale ... 32
SUMMARY: Pro?les and the Pro?le Space ... 33
3 Masses and Centroids ... 34
Contents ... 34
Points as weighted averages ... 35
Pro?le values are weights assigned to the vertices ... 36
Each pro?le point is a weighted average, or centroid, of the vertices ... 36
Average pro?le is also a weighted average of the pro?les themselves ... 37
Interpretation in the pro?le space ... 38
Merging rows or columns ... 39
Distributionally equivalent rows or columns ... 40
Changing the masses ... 40
SUMMARY: Masses and Centroids ... 41
4 Chi-Square Distance and Inertia ... 42
Contents ... 42
Hypothesis of independence or homogeneity for a contingency table ... 42
Chi-square (? 2 ) statistic to test the homogeneity hypothesis ... 43
Calculating the ? 2 statistic ... 44
Alternative expression of the ? 2 statistic in terms of pro?les and masses ... 44
(Total) inertia is the ? 2 statistic divided by sample size ... 45
Euclidean, or Pythagorian,distance ... 45
Chi-square distance: An example of a weighted Euclidean distance ... 46
Geometric interpretation of inertia ... 46
Minimum and maximum inertia ... 46
Inertia of rows is equal to inertia of columns ... 47
SUMMARY: Chi-Square Distance and Inertia ... 49
5 Plotting Chi-Square Distances ... 50
Contents ... 50
Di?erence between ? 2 -distance and ordinary Euclidean distance ... 50
Transforming the coordinates before plotting ... 51
E?ect of the transformation in practice ... 51
Geometric interpretation of the inertia and ? 2 statistic ... 53
Principle of distributional equivalence ... 54
? 2 -distances make the contributions of categories more similar ... 55
Weighted Euclidean distance ... 56
Theoretical justi?cation of ? 2 -distance ... 56
SUMMARY: Plotting Chi-Square Distances ... 57
6 Reduction of Dimensionality ... 58
Contents ... 58
Comparison of age group (row) pro?les ... 59
Identifying lower-dimensional subspaces ... 60
Projecting pro?les onto subspaces ... 60
Measuring quality of display ... 60
Approximation of interpro?le distances ... 61
Joint interpretation of pro?les and vertices ... 62
De?nition of closeness of points to a subspace ... 63
Formal de?nition of criterion optimized in CA ... 64
Singular value decomposition (SVD) ... 64
Finding the optimal subspace is not regression ... 64
SUMMARY: Reduction of Dimensionality ... 65
7 Optimal Scaling ... 66
Contents ... 66
Computation of overall mean using integer scale ... 67
Computation of group means using integer scale ... 67
Computation of variance using integer scale ... 68
Calculating scores with unknown scale values ... 68
Maximizing variance gives optimal scale ... 69
Optimal scale values from the best-?tting dimension of CA ... 69
Interpretation of optimal scale ... 70
Identi?cation conditions for an optimal scale ... 70
Any linear transformation of the scale is still optimal ... 70
Optimal scale is not unique ... 71
A criterion based on row-to-column distances ... 71
SUMMARY: Optimal Scaling ... 73
8 Symmetry of Row and Column Analyses ... 74
Contents ... 74
Summary of row analysis ... 74
Column analysis — pro?le values have symmetric interpretation ... 75
Column analysis — same total inertia ... 75
Column analysis — same dimensionality ... 75
Column analysis — same low-dimensional approximation ... 76
Column analysis — same coordinate values, rescaled ... 76
Principal axes and principal inertias ... 77
Scaling factor is the square root of the principal inertia ... 77
Correlation interpretation of the principal inertia ... 78
Graph of the correlation ... 79
Principal coordinates and standard coordinates ... 79
Maximizing squared correlation with the average ... 80
Minimizing loss of homogeneity within variables ... 80
SUMMARY:Symmetry of Row and Column Analyses ... 81
9 Two-Dimensional Displays ... 82
Contents ... 82
Row analysis ... 83
Interpretation of row pro?les and column vertices ... 83
Nesting of principal axes ... 84
Verifying the pro?le–vertex interpretation ... 85
Asymmetric maps ... 85
Symmetric map ... 87
Veri?cation of interpoint chi-squared distances in symmetric map ... 88
Danger in interpreting row-to-column distances in a symmetric map ... 89
SUMMARY: Two-Dimensional Displays ... 89
10 Three More Examples ... 90
Contents ... 90
Decomposition of inertia ... 91
Asymmetric map of row pro?les ... 91
Symmetric map ... 92
Dimensional interpretation of maps ... 93
Asymmetric CA map of species abundance data ... 95
One of the lowest inertias one can get, but with a signi?cant structure ... 95
Importance of preserving a unit aspect ratio in maps ... 96
SUMMARY: Three More Examples ... 96
11 Contributions to Inertia ... 98
Contents ... 98
Row and column inertias ... 99
Large and small contributions ... 99
Cell contributions to inertia ... 99
Decomposition along principal axes ... 100
Components of each principal inertia ... 100
Complete decomposition of inertia over pro?les and principal axes ... 101
Components of each pro?le’s inertia ... 102
Algebra of inertia decomposition ... 102
Relative contributions as squared angle cosines ... 103
Relative contributions as squared correlations ... 103
Quality of display in a subspace ... 104
Analogy with factor analysis ... 104
SUMMARY: Contributions to Inertia ... 105
12 Supplementary Points ... 106
Contents ... 106
First case — a point inherently di?erent from the rest ... 107
Second case — an outlier of low mass ... 108
Third case — displaying groups or partitions of points ... 110
Positioning a supplementary point relative to the vertices ... 110
Contributions of supplementary points ... 111
Vertices are supplementary points ... 111
Categorical supplementary variables and dummy variables ... 112
Continuous supplementary variables ... 112
SUMMARY: Supplementary Points ... 113
13 Correspondence Analysis Biplots ... 114
Contents ... 114
Relationship between scalar product and projection ... 115
For ?xed reference vector, scalar products are proportional to projections ... 115
A simple exact biplot ... 116
Some special patterns in biplots ... 117
Rank and dimensionality ... 117
Biplots give optimal approximations of real data ... 117
The CA model ... 117
Biplot of contingency ratios ... 118
Biplot from row pro?le point of view ... 118
Interpretation of the biplot ... 120
Calibration of biplots ... 120
Overall quality of display ... 120
SUMMARY: Correspondence Analysis Biplots ... 121
14 Transition and Regression Relationships ... 122
Contents ... 122
Coordinates on ?rst axis of scienti?c funding example ... 122
Regression between coordinates ... 123
The pro?le–vertex relationshi ... 123
Principal coordinates are conditional means in regression ... 124
Simultaneous linear regressions ... 125
Transition equations between rows and columns ... 125
Regression between coordinates using transition equations ... 126
Recall the CA bilinear model ... 126
Weighted regression ... 127
Correlations in weighted regression recover the relative contributions ... 128
Reciprocal averaging and alternating least squares ... 128
Contribution coordinates as regression coe?cients ... 128
SUMMARY: Transition and Regression Relationships ... 129
15 Clustering Rows and Columns ... 130
Contents ... 130
Partitioning the rows or the columns ... 130
Between- and within-groups inertia ... 131
Calculating the inertia within each group ... 132
Clustering algorithm ... 133
Tree representations of the clusterings ... 134
Decomposition of inertia (or ? 2 ) ... 135
Deciding on the partition ... 135
Testing hypotheses on clusters of rows or columns ... 136
Multiple comparisons ... 136
Multiple comparisons for contingency tables ... 136
Cut-o? ? 2 value for signi?cant clustering ... 136
Ward clustering ... 137
SUMMARY: Clustering Rows and Columns ... 137
16 Multiway Tables ... 138
Contents ... 138
Introducing a third variable in the health self-assessment data ... 138
Interaction between variables ... 138
Interactive coding ... 139
CA of the interactively coded cross-tabulation ... 139
Basic CA map of countries by responses ... 141
Introducing gender interactively ... 142
Introducing age group and gender ... 143
Arch (“horseshoe”) pattern in the map ... 144
SUMMARY: Multiway Tables ... 145
17 Stacked Tables ... 146
Contents ... 146
Stacking as an alternative to interactive coding ... 147
CA of stacked tables ... 147
Limitations in interpreting analysis of stacked tables ... 149
Decomposition of inertia in stacked tables ... 149
Stacking tables row- and columnwise ... 149
CA of row- and columnwise stacked tables ... 150
Partitioning of the inertia over all subtables ... 151
Only “between” associations displayed, not “within”/153,
SUMMARY: Stacked Tables ... 153
18 Multiple Correspondence Analysis ... 154
Contents ... 154
MCA de?nition number 1: CA of the indicator matrix ... 155
Inertia of indicator matrix ... 157
Burt matrix ... 157
MCA de?nition number 2: CA of the Burt matrix ... 158
Comparison of MCA based on indicator and Burt matrices ... 158
Comparison of MCA based on indicator and Burt matrices ... 158
Inertia of the Burt matrix ... 159
Positioning supplementary categories in the map ... 160
Interpretation of supplementary points ... 161
SUMMARY: Multiple Correspondence Analysis ... 161
19 Joint Correspondence Analysis ... 162
Contents ... 162
MCA gives bad ?t because the total inertia is in?ated ... 162
Ignoring the diagonal blocks — joint CA ... 163
Results of JCA ... 163
JCA results are not nested ... 165
Adjusting the results of MCA to ?t the o?-diagonal tables ... 165
A simple adjustment of the MCA solution ... 166
Adjusted inertia = average inertia in o?-diagonal blocks ... 166
Adjusting each principal inertia ... 166
Adjusted percentages of inertia ... 167
Supplementary points in adjusted MCA and JCA ... 168
20 Scaling Properties of MCA ... 170
Contents ... 170
Category quanti?cation as a goal ... 171
MCA as a principal component analysis of the indicator matrix ... 171
Maximizing inter-item correlation ... 172
MCA of scienti?c attitudes example ... 172
Individual squared correlations ... 173
Loss of homogeneity ... 174
Geometry of loss function in homogeneity analysis ... 175
Reliability and Cronbach’s alpha ... 176
The adjustment threshold rediscovered ... 177
SUMMARY:Scaling Properties of MCA ... 177
21 Subset Correspondence Analysis ... 178
Contents ... 178
Subset analysis keeps original margins ?xed ... 179
Subset CA of consonants, contribution biplot ... 179
Subset CA of the vowels, contribution biplot ... 179
Subset MCA ... 181
Subset analysis on an indicator matrix ... 182
Supplementary points in subset CA ... 183
Supplementary points in subset MCA ... 184
SUMMARY: Subset Correspondence Analysis ... 185
22 Compositional Data Analysis ... 186
Contents ... 186
Compositional data ... 186
Subcompositional coherence ... 186
Ratios and log-ratios are subcompositionally coherent ... 187
Log-ratio distances between samples and between parts ... 188
Weighted log-ratio distances between samples ... 188
Log-ratio analysis ... 189
Interpretation of links as estimated log-ratios ... 190
Diagnosing power models ... 191
Correspondence analysis and log-ratio analysis ... 192
SUMMARY: Compositional Data Analysis ... 193
23 Analysis of Matched Matrices ... 194
Contents ... 194
Matched matrices ... 194
Between- and within-groups inertia ... 195
One analysis that splits the “between” and “within” inertias ... 195
Display of the sum and di?erence components ... 196
Interpretation of the di?erence map ... 197
Analysing all e?ects in one analysis ... 199
Visualizing the e?ects ... 200
SUMMARY: Analysis of Matched Matrices ... 201
24 Analysis of Square Tables ... 202
Contents ... 202
CA of square table ... 203
Diagonal of table dominates the CA ... 203
Symmetry and skew-symmetry in a square table ... 205
CA of the symmetric part ... 205
CA of the skew-symmetric part ... 206
CA of symmetric and skew-symmetric parts in one step ... 206
Visualization of the symmetric and skew-symmetric parts ... 207
SUMMARY: Analysis of Square Tables ... 209
25 Correspondence Analysis of Networks ... 210
Contents ... 210
Network concepts and terminology ... 210
Square symmetric tables revisited: direct and inverse axes ... 212
Fitting o?-diagonal elements ... 212
CA of an adjacency matrix ... 213
The Laplacian matrix ... 213
A family of analyses of a symmetric matrix ... 215
Multidimensional scaling of a network ... 216
CA can perform MDS ... 216
SUMMARY: Correspondence Analysis of Networks ... 217
26 Data Recoding ... 218
Contents ... 218
Rating scales ... 218
Doubling of ratings ... 219
The counting paradigm ... 220
CA map of doubled ratings ... 220
Correlations interpreted by alignment of variables ... 221
Positions of rows and supplementary points ... 221
Preference data ... 222
Recoding continuous data by ranks and doubling ... 223
Other recoding schemes for continuous data ... 224
SUMMARY: Data Recoding ... 225
27 Canonical Correspondence Analysis ... 226
Contents ... 226
Supplementary continuous variables ... 226
Representing explanatory variables as supplementary variables ... 227
Dimensions as functions of explanatory variables ... 228
Constraining the dimensions of CA ... 229
Constrained and unconstrained spaces in CCA ... 229
Decomposition of inertia in CCA ... 229
The CCA triplot ... 231
Categorical explanatory variables ... 232
Weighted averages of explanatory variables for each species ... 232
Partial CCA ... 233
SUMMARY: Canonical Correspondence Analysis ... 233
28 Co-Inertia and Co-Correspondence Analysis ... 234
Contents ... 234
Co-inertia analysis ... 234
Some special cases of co-inertia analysis ... 235
Centroid discriminant analysis for CA ... 236
Co-correspondence analysis ... 239
SUMMARY: Co-Inertia and Co-Correspondence Analysis ... 241
29 Aspects of Stability and Inference ... 242
Contents ... 242
Information-transforming versus statistical inference ... 242
Stability of CA ... 243
Sampling variability of the CA solution ... 243
Bootstrapping the data ... 243
Multinomial sampling ... 244
Partial bootstrap of CA map, with convex hulls ... 244
Peeling the convex hull ... 244
The delta method ... 246
Testing hypotheses — theoretical approach ... 247
Testing hypotheses — Monte Carlo simulation ... 247
A permutation test ... 248
SUMMARY: Aspects of Stability and Inference ... 249
30 Permutation Tests ... 250
Contents ... 250
A simple univariate example ... 250
Permutation test for di?erence in means ... 251
Permutation test in multidimensional space ... 252
Permutation test for bivariate correlation ... 253
Permutation tests for bivariate categorical data ... 254
Permutation or bootstrap tests for multivariate categorical data ... 254
Permutation tests for CCA ... 255
Permutation test for matched matrices ... 256
Permutation tests for co-inertia analysis ... 256
SUMMARY: Permutation Tests ... 257
Appendix A: Theory of Correspondence Analysis ... 258
Contents ... 258
Computational algorithm ... 259
A note on the singular value decomposition (SVD) ... 260
The bilinear CA model ... 261
Transition equations between rows and columns ... 261
Supplementary points ... 262
Total inertia and ? 2 -distances ... 262
Contributions of points to principal inertias ... 263
Contributions of principal axes to point inertias (squared correlations) ... 263
Ward clustering of row or column pro?les ... 263
Stacked tables ... 264
Multiple CA ... 264
Joint CA ... 264
Percentage of inertia explained in JCA ... 264
Contributions in JCA ... 265
Adjusted inertias in MCA ... 266
Subset CA, MCA and JCA ... 266
Log-ratio analysis ... 267
Analysis of matched matrices ... 267
Analysis of square asymmetric tables ... 268
Analysis of square symmetric matrices ... 268
Canonical correspondence analysis (CCA) ... 269
Co-inertia analysis and co-correspondence analysis ... 271
Appendix B: Computation of Correspondence Analysis ... 272
Contents ... 272
The R program ... 272
Entering data into R ... 273
Some examples of R code ... 274
Three-dimensional graphics ... 274
Chi-square statistic, inertia and distances ... 275
Computing ? 2 -distances between all pro?les, using dist ... 276
Plotting the computed CA coordinates ... 277
The ca package ... 278
Numerical results of CA: inertias and contributions ... 279
Supplementary pro?les ... 280
Supplementary continuous variables ... 281
Options in ca package ... 281
Output of ca function ... 282
Subset analysis ... 282
Visualization options in the ca package ... 283
MCA in ca package ... 285
Preparation of multivariate categorical data ... 286
Extracting stacked table from Burt matrix ... 288
Data preparation for MCA ... 289
Listwise deletion of missing values ... 289
MCA of indicator matrix ... 289
MCA of Burt matrix ... 290
Adjusted MCA solution ... 291
Joint correspondence analysis ... 292
Subset MCA ... 293
Analysis of matched matrices ... 293
Canonical correspondence analysis (CCA) ... 295
Inference using resampling ... 298
Permutation testing and bootstrapping ... 298
Permutation testing in vegan ... 299
Weighted Ward clustering ... 299
Graphical options ... 300
LATEX graphics ... 300
Excel graphics ... 301
R graphics ... 301
Appendix C: Glossary of Terms ... 302
Appendix D: Bibliography of Correspondence Analysis ... 308
Appendix E: Epilogue ... 312
Index ... 322