Statistical Pattern Recognition

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Statistical pattern recognition is a very active area of study and research, which has seen many advances in recent years. New and emerging applications - such as data mining, web searching, multimedia data retrieval, face recognition, and cursive handwriting recognition - require robust and efficient pattern recognition techniques. Statistical decision making and estimation are regarded as fundamental to the study of pattern recognition.

Statistical Pattern Recognition, Second Edition has been fully updated with new methods, applications and references. It provides a comprehensive introduction to this vibrant area - with material drawn from engineering, statistics, computer science and the social sciences - and covers many application areas, such as database design, artificial neural networks, and decision support systems.

* Provides a self-contained introduction to statistical pattern recognition. * Each technique described is illustrated by real examples. * Covers Bayesian methods, neural networks, support vector machines, and unsupervised classification. * Each section concludes with a description of the applications that have been addressed and with further developments of the theory. * Includes background material on dissimilarity, parameter estimation, data, linear algebra and probability. * Features a variety of exercises, from 'open-book' questions to more lengthy projects.

The book is aimed primarily at senior undergraduate and graduate students studying statistical pattern recognition, pattern processing, neural networks, and data mining, in both statistics and engineering departments. It is also an excellent source of reference for technical professionals working in advanced information development environments.

For further information on the techniques and applications discussed in this book please visitĀ www.statistical-pattern-recognition.net

Author(s): Andrew R. Webb
Edition: 2
Publisher: Wiley
Year: 2002

Language: English
Pages: 515

Statistical Pattern Recognition......Page 1
Copyright......Page 5
Contents......Page 8
Preface......Page 16
Notation......Page 18
1.1 Statistical pattern recognition......Page 20
1.1.2 The basic model......Page 21
1.2 Stages in a pattern recognition problem......Page 22
1.3 Issues......Page 23
1.4 Supervised versus unsupervised......Page 24
1.5.1 Elementary decision theory......Page 25
1.5.2 Discriminant functions......Page 38
1.6 Multiple regression......Page 44
1.7 Outline of book......Page 46
1.8 Notes and references......Page 47
Exercises......Page 49
2.1 Introduction......Page 52
2.2.1 Linear and quadratic discriminant functions......Page 53
2.2.2 Regularised discriminant analysis......Page 56
2.2.3 Example application study......Page 57
2.2.5 Summary......Page 59
2.3.1 Maximum likelihood estimation via EM......Page 60
2.3.2 Mixture models for discrimination......Page 64
2.3.3 How many components?......Page 65
2.3.4 Example application study......Page 66
2.3.6 Summary......Page 68
2.4.1 Bayesian learning methods......Page 69
2.4.2 Markov chain Monte Carlo......Page 74
2.4.3 Bayesian approaches to discrimination......Page 89
2.4.4 Example application study......Page 91
2.5 Application studies......Page 94
2.8 Notes and references......Page 96
Exercises......Page 97
3.1 Introduction......Page 100
3.2 Histogram method......Page 101
3.2.1 Data-adaptive histograms......Page 102
3.2.2 Independence assumption......Page 103
3.2.4 Maximum weight dependence trees......Page 104
3.2.5 Bayesian networks......Page 107
3.2.7 Further developments......Page 110
3.2.8 Summary......Page 111
3.3.1 k-nearest-neighbour decision rule......Page 112
3.3.3 Algorithms......Page 114
3.3.4 Editing techniques......Page 117
3.3.5 Choice of distance metric......Page 120
3.3.6 Example application study......Page 121
3.3.7 Further developments......Page 122
3.3.8 Summary......Page 123
3.4 Expansion by basis functions......Page 124
3.5 Kernel methods......Page 125
3.5.1 Choice of smoothing parameter......Page 130
3.5.2 Choice of kernel......Page 132
3.5.3 Example application study......Page 133
3.5.5 Summary......Page 134
3.6 Application studies......Page 135
3.7 Summary and discussion......Page 138
3.9 Notes and references......Page 139
Exercises......Page 140
4.1 Introduction......Page 142
4.2.2 Perceptron criterion......Page 143
4.2.3 Fisher's criterion......Page 147
4.2.4 Least mean squared error procedures......Page 149
4.2.5 Support vector machines......Page 153
4.2.6 Example application study......Page 160
4.2.8 Summary......Page 161
4.3.1 General ideas......Page 163
4.3.3 Fisher's criterion--linear discriminant analysis......Page 164
4.3.4 Least mean squared error procedures......Page 167
4.3.5 Optimal scaling......Page 171
4.3.7 Multiclass support vector machines......Page 174
4.3.9 Further developments......Page 175
4.4.1 Two-group case......Page 177
4.4.2 Maximum likelihood estimation......Page 178
4.4.3 Multiclass logistic discrimination......Page 180
4.4.4 Example application study......Page 181
4.5 Application studies......Page 182
4.6 Summary and discussion......Page 183
Exercises......Page 184
5.1 Introduction......Page 188
5.2.1 Least squares error measure......Page 190
5.2.2 Maximum likelihood......Page 194
5.2.3 Entropy......Page 195
5.3.1 Introduction......Page 196
5.3.2 Motivation......Page 197
5.3.3 Specifying the model......Page 200
5.3.6 Example application study......Page 206
5.3.8 Summary......Page 208
5.4 Nonlinear support vector machines......Page 209
5.4.1 Types of kernel......Page 210
5.4.3 Support vector machines for regression......Page 211
5.4.4 Example application study......Page 214
5.4.5 Further developments......Page 215
5.5 Application studies......Page 216
5.7 Recommendations......Page 218
Exercises......Page 219
6.1 Introduction......Page 222
6.2.1 Introduction......Page 223
6.2.3 Determining the multilayer perceptron weights......Page 224
6.2.4 Properties......Page 231
6.2.5 Example application study......Page 232
6.2.6 Further developments......Page 233
6.3.1 Introduction......Page 235
6.3.2 Projection pursuit for discrimination......Page 237
6.3.3 Example application study......Page 238
6.3.5 Summary......Page 239
6.5 Summary and discussion......Page 240
6.6 Recommendations......Page 241
Exercises......Page 242
7.2.1 Introduction......Page 244
7.2.2 Classifier tree construction......Page 247
7.2.3 Other issues......Page 256
7.2.5 Further developments......Page 258
7.2.6 Summary......Page 259
7.3.2 Recursive partitioning model......Page 260
7.3.3 Example application study......Page 263
7.4 Application studies......Page 264
7.6 Recommendations......Page 266
Exercises......Page 267
8.1 Introduction......Page 270
8.2.1 Discriminability......Page 271
8.2.2 Reliability......Page 277
8.2.3 ROC curves for two-class rules......Page 279
8.2.4 Example application study......Page 282
8.2.5 Further developments......Page 283
8.2.6 Summary......Page 284
8.3.1 Which technique is best?......Page 285
8.3.3 Comparing rules when misclassification costs are uncertain......Page 286
8.3.4 Example application study......Page 288
8.3.5 Further developments......Page 289
8.4.1 Introduction......Page 290
8.4.2 Motivation......Page 291
8.4.3 Characteristics of a combination scheme......Page 294
8.4.4 Data fusion......Page 297
8.4.5 Classifier combination methods......Page 303
8.4.6 Example application study......Page 316
8.4.8 Summary......Page 317
8.6 Summary and discussion......Page 318
8.8 Notes and references......Page 319
Exercises......Page 320
9.1 Introduction......Page 324
9.2 Feature selection......Page 326
9.2.1 Feature selection criteria......Page 327
9.2.2 Search algorithms for feature selection......Page 330
9.2.3 Suboptimal search algorithms......Page 333
9.2.5 Further developments......Page 336
9.3 Linear feature extraction......Page 337
9.3.1 Principal components analysis......Page 338
9.3.2 Karhunen-Loeve transformation......Page 348
9.3.3 Factor analysis......Page 354
9.3.4 Example application study......Page 361
9.3.5 Further developments......Page 362
9.4 Multidimensional scaling......Page 363
9.4.1 Classical scaling......Page 364
9.4.2 Metric multidimensional scaling......Page 365
9.4.3 Ordinal scaling......Page 366
9.4.4 Algorithms......Page 369
9.4.5 Multidimensional scaling for feature extraction......Page 370
9.4.6 Example application study......Page 371
9.4.8 Summary......Page 372
9.5 Application studies......Page 373
9.7 Recommendations......Page 374
9.8 Notes and references......Page 375
Exercises......Page 376
10.1 Introduction......Page 380
10.2 Hierarchical methods......Page 381
10.2.1 Single-link method......Page 383
10.2.2 Complete-link method......Page 386
10.2.4 General agglomerative algorithm......Page 387
10.2.5 Properties of a hierarchical classification......Page 388
10.2.7 Summary......Page 389
10.3 Quick partitions......Page 390
10.4.1 Model description......Page 391
10.5 Sum-of-squares methods......Page 393
10.5.1 Clustering criteria......Page 394
10.5.2 Clustering algorithms......Page 395
10.5.3 Vector quantisation......Page 401
10.5.4 Example application study......Page 413
10.5.6 Summary......Page 414
10.6.1 Introduction......Page 415
10.6.3 Choosing the number of clusters......Page 416
10.6.4 Identifying genuine clusters......Page 418
10.7 Application studies......Page 419
10.8 Summary and discussion......Page 421
10.9 Recommendations......Page 423
10.10 Notes and references......Page 424
Exercises......Page 425
11.1 Model selection......Page 428
11.1.2 Cross-validation......Page 429
11.1.4 Akaike's information criterion......Page 430
11.2 Learning with unreliable classification......Page 431
11.3 Missing data......Page 432
11.4 Outlier detection and robust procedures......Page 433
11.5 Mixed continuous and discrete variables......Page 434
11.6.1 Bounds on the expected risk......Page 435
11.6.2 The Vapnik-Chervonenkis dimension......Page 436
A.1.1 Numeric variables......Page 438
A.1.3 Binary variables......Page 442
A.1.4 Summary......Page 443
A.2.2 Methods based on probabilistic distance......Page 444
A.2.3 Probabilistic dependence......Page 447
A.3 Discussion......Page 448
B.1.1 Properties of estimators......Page 450
B.1.2 Maximum likelihood......Page 452
B.1.4 Bayesian estimates......Page 453
C.1 Basic properties and definitions......Page 456
C.2 Notes and references......Page 460
D.2 Formulating the problem......Page 462
D.3 Data collection......Page 463
D.4 Initial examination of data......Page 465
D.6 Notes and references......Page 467
E.1 Definitions and terminology......Page 468
E.2 Normal distribution......Page 473
E.3 Probability distributions......Page 474
References......Page 478
Index......Page 510