Author(s): Paolo Giudici, Silvia Figini
Edition: 2
Year: 2009
Language: English
Pages: 258
Tags: Информатика и вычислительная техника;Искусственный интеллект;Интеллектуальный анализ данных;
Applied Data Mining for Business and Industry......Page 6
Contents......Page 8
1 Introduction......Page 12
Part I Methodology......Page 16
2.1 Statistical units and statistical variables......Page 18
2.2 Data matrices and their transformations......Page 20
2.3 Complex data structures......Page 21
2.4 Summary......Page 22
3.1.1 Measures of location......Page 24
3.1.2 Measures of variability......Page 26
3.1.3 Measures of heterogeneity......Page 27
3.1.4 Measures of concentration......Page 28
3.1.5 Measures of asymmetry......Page 30
3.1.6 Measures of kurtosis......Page 31
3.2 Bivariate exploratory analysis of quantitative data......Page 33
3.3 Multivariate exploratory analysis of quantitative data......Page 36
3.4 Multivariate exploratory analysis of qualitative data......Page 38
3.4.1 Independence and association......Page 39
3.4.2 Distance measures......Page 40
3.4.3 Dependency measures......Page 42
3.4.4 Model-based measures......Page 43
3.5 Reduction of dimensionality......Page 45
3.5.1 Interpretation of the principal components......Page 47
3.6 Further reading......Page 50
4 Model specification......Page 52
4.1 Measures of distance......Page 53
4.1.1 Euclidean distance......Page 54
4.1.2 Similarity measures......Page 55
4.1.3 Multidimensional scaling......Page 57
4.2 Cluster analysis......Page 58
4.2.1 Hierarchical methods......Page 60
4.2.2 Evaluation of hierarchical methods......Page 64
4.2.3 Non-hierarchical methods......Page 66
4.3.1 Bivariate linear regression......Page 68
4.3.2 Properties of the residuals......Page 71
4.3.3 Goodness of fit......Page 73
4.3.4 Multiple linear regression......Page 74
4.4 Logistic regression......Page 78
4.4.1 Interpretation of logistic regression......Page 79
4.4.2 Discriminant analysis......Page 81
4.5 Tree models......Page 82
4.5.1 Division criteria......Page 84
4.5.2 Pruning......Page 85
4.6 Neural networks......Page 87
4.6.1 Architecture of a neural network......Page 90
4.6.2 The multilayer perceptron......Page 92
4.6.3 Kohonen networks......Page 98
4.7 Nearest-neighbour models......Page 100
4.8.1 Association rules......Page 101
4.9 Uncertainty measures and inference......Page 107
4.9.1 Probability......Page 108
4.9.2 Statistical models......Page 110
4.9.3 Statistical inference......Page 114
4.10 Non-parametric modelling......Page 120
4.11 The normal linear model......Page 123
4.11.1 Main inferential results......Page 124
4.12 Generalised linear models......Page 127
4.12.1 The exponential family......Page 128
4.12.2 Definition of generalised linear models......Page 129
4.12.3 The logistic regression model......Page 136
4.13.1 Construction of a log-linear model......Page 137
4.13.2 Interpretation of a log-linear model......Page 139
4.13.3 Graphical log-linear models......Page 140
4.13.4 Log-linear model comparison......Page 143
4.14 Graphical models......Page 144
4.14.1 Symmetric graphical models......Page 146
4.14.2 Recursive graphical models......Page 150
4.14.3 Graphical models and neural networks......Page 152
4.15 Survival analysis models......Page 153
4.16 Further reading......Page 155
5 Model evaluation......Page 158
5.1.1 Distance between statistical models......Page 159
5.1.2 Discrepancy of a statistical model......Page 161
5.1.3 Kullback–Leibler discrepancy......Page 162
5.2 Criteria based on scoring functions......Page 164
5.3 Bayesian criteria......Page 166
5.4 Computational criteria......Page 167
5.5 Criteria based on loss functions......Page 170
5.6 Further reading......Page 173
Part II Business case studies......Page 174
6.2 Description of the data......Page 176
6.4 Model building......Page 178
6.4.1 Cluster analysis......Page 179
6.4.2 Kohonen networks......Page 180
6.5 Model comparison......Page 182
6.6 Summary report......Page 183
7.1 Objectives of the analysis......Page 186
7.2 Description of the data......Page 187
7.3 Exploratory data analysis......Page 189
7.4.1 Log-linear models......Page 192
7.4.2 Association rules......Page 195
7.5 Model comparison......Page 197
7.6 Summary report......Page 202
8.1 Objectives of the analysis......Page 204
8.3 Exploratory data analysis......Page 205
8.4 Model building......Page 208
8.5 Summary......Page 212
9.2 Description of the data......Page 214
9.3 Exploratory data analysis......Page 216
9.4 Model building......Page 217
9.5 Model comparison......Page 220
9.6 Summary report......Page 221
10.1 Objectives of the analysis......Page 222
10.3 Exploratory data analysis......Page 223
10.4 Model specification......Page 225
10.5 Model comparison......Page 228
10.6 Summary report......Page 229
11.1 Objectives of the analysis......Page 230
11.2 Description of the data......Page 231
11.3 Exploratory data analysis......Page 232
11.4 Model specification......Page 234
11.5 Model comparison......Page 235
11.6 Summary report......Page 236
12.1 Context and objectives of the analysis......Page 238
12.2 Exploratory data analysis......Page 239
12.3 Model building......Page 241
12.4 Model comparison......Page 243
12.5 Summary conclusions......Page 246
References......Page 248
Index......Page 254