Statistical Pattern Recognition

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Statistical pattern recognition is a very active area of study and research, which has seen many advances in recent years. New and emerging applications - such as data mining, web searching, multimedia data retrieval, face recognition, and cursive handwriting recognition - require robust and efficient pattern recognition techniques. Statistical decision making and estimation are regarded as fundamental to the study of pattern recognition.

Statistical Pattern Recognition, Second Edition has been fully updated with new methods, applications and references. It provides a comprehensive introduction to this vibrant area - with material drawn from engineering, statistics, computer science and the social sciences - and covers many application areas, such as database design, artificial neural networks, and decision support systems.

* Provides a self-contained introduction to statistical pattern recognition. * Each technique described is illustrated by real examples. * Covers Bayesian methods, neural networks, support vector machines, and unsupervised classification. * Each section concludes with a description of the applications that have been addressed and with further developments of the theory. * Includes background material on dissimilarity, parameter estimation, data, linear algebra and probability. * Features a variety of exercises, from 'open-book' questions to more lengthy projects.

The book is aimed primarily at senior undergraduate and graduate students studying statistical pattern recognition, pattern processing, neural networks, and data mining, in both statistics and engineering departments. It is also an excellent source of reference for technical professionals working in advanced information development environments.

Author(s): Andrew R. Webb
Edition: 2
Publisher: Wiley
Year: 2002

Language: English
Pages: 514

Cover......Page 1
Contents......Page 7
Preface......Page 15
Notation......Page 17
1.1.1 Introduction......Page 19
1.1.2 The basic model......Page 20
1.2 Stages in a pattern recognition problem......Page 21
1.3 Issues......Page 22
1.4 Supervised versus unsupervised......Page 23
1.5.1 Elementary decision theory......Page 24
1.5.2 Discriminant functions......Page 37
1.6 Multiple regression......Page 43
1.7 Outline of book......Page 45
1.8 Notes and references......Page 46
Exercises......Page 48
2.1 Introduction......Page 51
2.2.1 Linear and quadratic discriminant functions......Page 52
2.2.2 Regularised discriminant analysis......Page 55
2.2.3 Example application study......Page 56
2.2.5 Summary......Page 58
2.3.1 Maximum likelihood estimation via EM......Page 59
2.3.2 Mixture models for discrimination......Page 63
2.3.3 How many components?......Page 64
2.3.4 Example application study......Page 65
2.3.6 Summary......Page 67
2.4.1 Bayesian learning methods......Page 68
2.4.2 Markov chain Monte Carlo......Page 73
2.4.3 Bayesian approaches to discrimination......Page 88
2.4.4 Example application study......Page 90
2.5 Application studies......Page 93
2.8 Notes and references......Page 95
Exercises......Page 96
3.1 Introduction......Page 99
3.2 Histogram method......Page 100
3.2.1 Data-adaptive histograms......Page 101
3.2.2 Independence assumption......Page 102
3.2.4 Maximum weight dependence trees......Page 103
3.2.5 Bayesian networks......Page 106
3.2.7 Further developments......Page 109
3.2.8 Summary......Page 110
3.3.1 k-nearest-neighbour decision rule......Page 111
3.3.3 Algorithms......Page 113
3.3.4 Editing techniques......Page 116
3.3.5 Choice of distance metric......Page 119
3.3.6 Example application study......Page 120
3.3.7 Further developments......Page 121
3.3.8 Summary......Page 122
3.4 Expansion by basis functions......Page 123
3.5 Kernel methods......Page 124
3.5.1 Choice of smoothing parameter......Page 129
3.5.2 Choice of kernel......Page 131
3.5.3 Example application study......Page 132
3.5.5 Summary......Page 133
3.6 Application studies......Page 134
3.7 Summary and discussion......Page 137
3.9 Notes and references......Page 138
Exercises......Page 139
4.1 Introduction......Page 141
4.2.2 Perceptron criterion......Page 142
4.2.3 Fisher’s criterion......Page 146
4.2.4 Least mean squared error procedures......Page 148
4.2.5 Support vector machines......Page 152
4.2.6 Example application study......Page 159
4.2.8 Summary......Page 160
4.3.1 General ideas......Page 162
4.3.3 Fisher’s criterion – linear discriminant analysis......Page 163
4.3.4 Least mean squared error procedures......Page 166
4.3.5 Optimal scaling......Page 170
4.3.7 Multiclass support vector machines......Page 173
4.3.9 Further developments......Page 174
4.4.1 Two-group case......Page 176
4.4.2 Maximum likelihood estimation......Page 177
4.4.3 Multiclass logistic discrimination......Page 179
4.4.4 Example application study......Page 180
4.5 Application studies......Page 181
4.6 Summary and discussion......Page 182
Exercises......Page 183
5.1 Introduction......Page 187
5.2.1 Least squares error measure......Page 189
5.2.2 Maximum likelihood......Page 193
5.2.3 Entropy......Page 194
5.3.1 Introduction......Page 195
5.3.2 Motivation......Page 196
5.3.3 Specifying the model......Page 199
5.3.6 Example application study......Page 205
5.3.8 Summary......Page 207
5.4 Nonlinear support vector machines......Page 208
5.4.1 Types of kernel......Page 209
5.4.3 Support vector machines for regression......Page 210
5.4.4 Example application study......Page 213
5.4.5 Further developments......Page 214
5.5 Application studies......Page 215
5.7 Recommendations......Page 217
Exercises......Page 218
6.1 Introduction......Page 221
6.2.1 Introduction......Page 222
6.2.3 Determining the multilayer perceptron weights......Page 223
6.2.4 Properties......Page 230
6.2.5 Example application study......Page 231
6.2.6 Further developments......Page 232
6.3.1 Introduction......Page 234
6.3.2 Projection pursuit for discrimination......Page 236
6.3.3 Example application study......Page 237
6.3.5 Summary......Page 238
6.5 Summary and discussion......Page 239
6.6 Recommendations......Page 240
Exercises......Page 241
7.2.1 Introduction......Page 243
7.2.2 Classifier tree construction......Page 246
7.2.3 Other issues......Page 255
7.2.5 Further developments......Page 257
7.2.6 Summary......Page 258
7.3.2 Recursive partitioning model......Page 259
7.3.3 Example application study......Page 262
7.4 Application studies......Page 263
7.6 Recommendations......Page 265
Exercises......Page 266
8.1 Introduction......Page 269
8.2.1 Discriminability......Page 270
8.2.2 Reliability......Page 276
8.2.3 ROC curves for two-class rules......Page 278
8.2.4 Example application study......Page 281
8.2.5 Further developments......Page 282
8.2.6 Summary......Page 283
8.3.1 Which technique is best?......Page 284
are uncertain......Page 285
8.3.4 Example application study......Page 287
8.3.5 Further developments......Page 288
8.4.1 Introduction......Page 289
8.4.2 Motivation......Page 290
8.4.3 Characteristics of a combination scheme......Page 293
8.4.4 Data fusion......Page 296
8.4.5 Classifier combination methods......Page 302
8.4.6 Example application study......Page 315
8.4.8 Summary......Page 316
8.6 Summary and discussion......Page 317
8.8 Notes and references......Page 318
Exercises......Page 319
9.1 Introduction......Page 323
9.2 Feature selection......Page 325
9.2.1 Feature selection criteria......Page 326
9.2.2 Search algorithms for feature selection......Page 329
9.2.3 Suboptimal search algorithms......Page 332
9.2.5 Further developments......Page 335
9.3 Linear feature extraction......Page 336
9.3.1 Principal components analysis......Page 337
9.3.2 Karhunen–Lo `eve transformation......Page 347
9.3.3 Factor analysis......Page 353
9.3.4 Example application study......Page 360
9.3.5 Further developments......Page 361
9.4 Multidimensional scaling......Page 362
9.4.1 Classical scaling......Page 363
9.4.2 Metric multidimensional scaling......Page 364
9.4.3 Ordinal scaling......Page 365
9.4.4 Algorithms......Page 368
9.4.5 Multidimensional scaling for feature extraction......Page 369
9.4.6 Example application study......Page 370
9.4.8 Summary......Page 371
9.5 Application studies......Page 372
9.7 Recommendations......Page 373
9.8 Notes and references......Page 374
Exercises......Page 375
10.1 Introduction......Page 379
10.2 Hierarchical methods......Page 380
10.2.1 Single-link method......Page 382
10.2.2 Complete-link method......Page 385
10.2.4 General agglomerative algorithm......Page 386
10.2.5 Properties of a hierarchical classification......Page 387
10.2.7 Summary......Page 388
10.3 Quick partitions......Page 389
10.4.1 Model description......Page 390
10.5 Sum-of-squares methods......Page 392
10.5.1 Clustering criteria......Page 393
10.5.2 Clustering algorithms......Page 394
10.5.3 Vector quantisation......Page 400
10.5.4 Example application study......Page 412
10.5.6 Summary......Page 413
10.6.1 Introduction......Page 414
10.6.3 Choosing the number of clusters......Page 415
10.6.4 Identifying genuine clusters......Page 417
10.7 Application studies......Page 418
10.8 Summary and discussion......Page 420
10.9 Recommendations......Page 422
10.10 Notes and references......Page 423
Exercises......Page 424
11.1 Model selection......Page 427
11.1.2 Cross-validation......Page 428
11.1.4 Akaike’s information criterion......Page 429
11.2 Learning with unreliable classification......Page 430
11.3 Missing data......Page 431
11.4 Outlier detection and robust procedures......Page 432
11.5 Mixed continuous and discrete variables......Page 433
11.6.1 Bounds on the expected risk......Page 434
11.6.2 The Vapnik–Chervonenkis dimension......Page 435
A.1.1 Numeric variables......Page 437
A.1.3 Binary variables......Page 441
A.1.4 Summary......Page 442
A.2.2 Methods based on probabilistic distance......Page 443
A.2.3 Probabilistic dependence......Page 446
A.3 Discussion......Page 447
B.1.1 Properties of estimators......Page 449
B.1.2 Maximum likelihood......Page 451
B.1.4 Bayesian estimates......Page 452
C.1 Basic properties and definitions......Page 455
C.2 Notes and references......Page 459
D.2 Formulating the problem......Page 461
D.3 Data collection......Page 462
D.4 Initial examination of data......Page 464
D.6 Notes and references......Page 466
E.1 Definitions and terminology......Page 467
E.2 Normal distribution......Page 472
E.3 Probability distributions......Page 473
References......Page 477
Index......Page 509