Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demonstrate the application of the theory, and figures are used extensively to illustrate graphical techniques. The book is comprehensive yet relatively non-mathematical, focusing on the practical aspects of cluster analysis.Key Features:• Presents a comprehensive guide to clustering techniques, with focus on the practical aspects of cluster analysis.• Provides a thorough revision of the fourth edition, including new developments in clustering longitudinal data and examples from bioinformatics and gene studies• Updates the chapter on mixture models to include recent developments and presents a new chapter on mixture modeling for structured data.Practitioners and researchers working in cluster analysis and data analysis will benefit from this book.
Author(s): Brian S. Everitt, Dr Sabine Landau, Dr Morven Leese, Dr Daniel Stahl
Series: Wiley Series in Probability and Statistics
Edition: 5th
Publisher: Wiley
Year: 2011
Language: English
Pages: 348
Tags: Информатика и вычислительная техника;Искусственный интеллект;Интеллектуальный анализ данных;
Title
......Page 5
Contents......Page 9
Preface......Page 15
Acknowledgement......Page 17
1.1 Introduction......Page 19
1.2 Reasons for classifying......Page 21
1.3 Numerical methods of classification – cluster analysis......Page 22
1.4 What is a cluster?......Page 25
1.5.2 Astronomy......Page 27
1.5.3 Psychiatry......Page 28
1.5.4 Weather classification......Page 29
1.5.6 Bioinformatics and genetics......Page 30
1.6 Summary......Page 31
2.1 Introduction......Page 33
2.2.2 Scatterplots......Page 34
2.2.3 Density estimation......Page 37
2.2.4 Scatterplot matrices......Page 42
2.3.1 Principal components analysis of multivariate data......Page 47
2.3.2 Exploratory projection pursuit......Page 50
2.3.3 Multidimensional scaling......Page 54
2.4 Three-dimensional plots and trellis graphics......Page 56
2.5 Summary......Page 59
3.1 Introduction......Page 61
3.2.1 Similarity measures for binary data......Page 64
3.2.2 Similarity measures for categorical data with more than two levels......Page 65
3.3 Dissimilarity and distance measures for continuous data......Page 67
3.4 Similarity measures for data containing both continuous and categorical variables......Page 72
3.5 Proximity measures for structured data......Page 74
3.6.2 Inter-group proximity based on group summaries for continuous data......Page 79
3.6.3 Inter-group proximity based on group summaries for categorical data......Page 80
3.7 Weighting variables......Page 81
3.8 Standardization......Page 85
3.9 Choice of proximity measure......Page 86
3.10 Summary......Page 87
4.1 Introduction......Page 89
4.2.1 Illustrative examples of agglomerative methods......Page 91
4.2.2 The standard agglomerative methods......Page 94
4.2.3 Recurrence formula for agglomerative methods......Page 96
4.2.4 Problems of agglomerative hierarchical methods......Page 98
4.2.5 Empirical studies of hierarchical agglomerative methods......Page 101
4.3.1 Monothetic divisive methods......Page 102
4.3.2 Polythetic divisive methods......Page 104
4.4.1 Dendrograms and other tree representations......Page 106
4.4.2 Comparing dendrograms and measuring their distortion......Page 109
4.4.3 Mathematical properties of hierarchical methods......Page 110
4.4.4 Choice of partition – the problem of the number of groups......Page 113
4.4.5 Hierarchical algorithms......Page 114
4.4.6 Methods for large data sets......Page 115
4.5.1 Dolphin whistles – agglomerative clustering......Page 116
4.5.3 Globalization of cities – polythetic divisive method......Page 119
4.5.4 Women’s life histories – divisive clustering of sequence data......Page 123
4.5.5 Composition of mammals’ milk – exemplars, dendrogram seriation and choice of partition......Page 125
4.6 Summary......Page 128
5.1 Introduction......Page 129
5.2 Clustering criteria derived from the dissimilarity matrix......Page 130
5.3 Clustering criteria derived from continuous data......Page 131
5.3.1 Minimization of trace(W)......Page 132
5.3.4 Properties of the clustering criteria......Page 133
5.3.5 Alternative criteria for clusters of different shapes and sizes......Page 134
5.4 Optimization algorithms......Page 139
5.4.1 Numerical example......Page 142
5.4.2 More on k-means......Page 143
5.5 Choosing the number of clusters......Page 144
5.6.1 Survey of student attitudes towards video games......Page 148
5.6.2 Air pollution indicators for US cities......Page 151
5.6.3 Aesthetic judgement of painters......Page 154
5.6.4 Classification of ‘nonspecific’ back pain......Page 159
5.7 Summary......Page 160
6.1 Introduction......Page 161
6.2 Finite mixture densities......Page 162
6.2.1 Maximum likelihood estimation......Page 163
6.2.2 Maximum likelihood estimation of mixtures of multivariate normal densities......Page 164
6.2.3 Problems with maximum likelihood estimation of finite mixture models using the EM algorithm......Page 168
6.3.1 Mixtures of multivariate t-distributions......Page 169
6.3.2 Mixtures for categorical data – latent class analysis......Page 170
6.3.3 Mixture models for mixed-mode data......Page 171
6.4 Bayesian analysis of mixtures......Page 172
6.4.1 Choosing a prior distribution......Page 173
6.4.2 Label switching......Page 174
6.5.1 Log-likelihood ratio test statistics......Page 175
6.5.2 Information criteria......Page 178
6.5.3 Bayes factors......Page 179
6.5.4 Markov chain Monte Carlo methods......Page 180
6.6 Dimension reduction – variable selection in finite mixture modelling......Page 181
6.8 Software for finite mixture modelling......Page 183
6.9.1 Finite mixture densities with univariate Gaussian components......Page 184
6.9.2 Finite mixture densities with multivariate Gaussian components......Page 191
6.9.3 Applications of latent class analysis......Page 195
6.9.4 Application of a mixture model with different component densities......Page 196
6.10 Summary......Page 203
7.1 Introduction......Page 205
7.2 Finite mixture models for structured data......Page 208
7.3 Finite mixtures of factor models......Page 210
7.4 Finite mixtures of longitudinal models......Page 215
7.5.1 Application of finite mixture factor analysis to the ‘categorical versus dimensional representation’ debate......Page 220
7.5.2 Application of finite mixture confirmatory factor analysis to cluster genes using replicated microarray experiments......Page 223
7.5.3 Application of finite mixture exploratory factor analysis to cluster Italian wines......Page 225
7.5.4 Application of growth mixture modelling to identify distinct developmental trajectories......Page 226
7.5.5 Application of growth mixture modelling to identify trajectories of perinatal depressive symptomatology......Page 229
7.6 Summary......Page 230
8.1 Introduction......Page 233
8.2.1 Mode analysis......Page 234
8.2.2 Nearest-neighbour clustering procedures......Page 235
8.3 Density-based spatial clustering of applications with noise......Page 238
8.4.1 Clumping and related techniques......Page 240
8.4.2 Additive clustering......Page 241
8.4.3 Application of MAPCLUS to data on social relations in a monastery......Page 243
8.4.4 Pyramids......Page 244
8.4.5 Application of pyramid clustering to gene sequences of yeasts......Page 248
8.5 Simultaneous clustering of objects and variables......Page 249
8.5.1 Hierarchical classes......Page 250
8.5.3 The error variance technique......Page 252
8.6 Clustering with constraints......Page 255
8.6.1 Contiguity constraints......Page 258
8.7 Fuzzy clustering......Page 260
8.7.1 Methods for fuzzy cluster analysis......Page 263
8.7.3 Application of fuzzy cluster analysis to Roman glass composition......Page 264
8.8 Clustering and artificial neural networks......Page 267
8.8.1 Components of a neural network......Page 268
8.8.2 The Kohonen self-organizing map......Page 270
8.8.3 Application of neural nets to brainstorming sessions......Page 272
8.9 Summary......Page 273
9.1 Introduction......Page 275
9.2 Using clustering techniques in practice......Page 278
9.3 Testing for absence of structure......Page 280
9.4.1 Comparing partitions......Page 282
9.4.2 Comparing dendrograms......Page 283
9.5 Internal cluster quality, influence and robustness......Page 285
9.5.1 Internal cluster quality......Page 286
9.5.2 Robustness – split-sample validation and consensus trees......Page 287
9.5.3 Influence of individual points......Page 289
9.6 Displaying cluster solutions graphically......Page 291
9.7 Illustrative examples......Page 296
9.7.2 Scotch whisky tasting – cophenetic matrices for comparing clusterings......Page 297
9.7.3 Chemical compounds in the pharmaceutical industry......Page 299
9.7.4 Evaluating clustering algorithms for gene expression data......Page 303
9.8 Summary......Page 305
Bibliography......Page 307
Index......Page 339