Pattern Recognition Algorithms for Data Mining covers the topic of data mining from a pattern recognition perspective. This unique book presents real life data sets from various domains, such as geographic information systems, remote sensing imagery, and population census, to demonstrate the use of innovative new methodologies. Classical approaches are covered along with granular computation by integrating fuzzy sets, artificial neural networks, and genetic algorithms for efficient knowledge discovery. The authors then compare the granular computing and rough fuzzy approaches with the more classical methods and clearly demonstrate why they are more efficient.
Author(s): Sankar K. Pal, Pabitra Mitra
Series: Chapman & Hall/CRC Computer Science & Data Analysis
Edition: 1
Publisher: Chapman and Hall/CRC
Year: 2004
Language: English
Pages: 218
Pattern Recognition Algorithms for Data Mining: Scalability, Knowledge Discovery, and Soft Granular Computing......Page 1
Contents......Page 4
Foreword......Page 9
Preface......Page 14
List of Tables......Page 17
List of Figures......Page 19
References......Page 0
1.1 Introduction......Page 22
1.2 Pattern Recognition in Brief......Page 24
1.2.2 Feature selection/extraction......Page 25
1.2.3 Classification......Page 26
1.3 Knowledge Discovery in Databases (KDD)......Page 28
1.4.1 Data mining tasks......Page 31
1.4.3 Applications of data mining......Page 33
1.5.1 Database perspective......Page 35
1.5.3 Pattern recognition perspective......Page 36
1.5.4 Research issues and challenges......Page 37
1.6.1 Data reduction......Page 38
1.6.2 Dimensionality reduction......Page 39
1.6.4 Data partitioning......Page 40
1.6.6 Efficient search algorithms......Page 41
1.7 Significance of Soft Computing in KDD......Page 42
1.8 Scope of the Book......Page 43
2.1 Introduction......Page 49
2.2.1 Condensed nearest neighbor rule......Page 52
2.2.2 Learning vector quantization......Page 53
2.3 Multiscale Representation of Data......Page 54
2.4 Nearest Neighbor Density Estimate......Page 57
2.5 Multiscale Data Condensation Algorithm......Page 58
2.6 Experimental Results and Comparisons......Page 60
2.6.2 Test of statistical significance......Page 61
2.6.3 Classification: Forest cover data......Page 67
2.6.4 Clustering: Satellite image data......Page 68
2.6.5 Rule generation: Census data......Page 69
2.7 Summary......Page 72
3.1 Introduction......Page 79
3.2 Feature Extraction......Page 80
3.3 Feature Selection......Page 82
3.3.1 Filter approach......Page 83
3.4 Feature Selection Using Feature Similarity (FSFS)......Page 84
3.4.1 Feature similarity measures......Page 85
3.4.1.2 Least square regression error (e)......Page 86
3.4.1.3 Maximal information compression index (λ2)......Page 87
3.4.2 Feature selection through clustering......Page 88
3.5.1 Supervised indices......Page 91
3.5.2 Unsupervised indices......Page 92
3.5.3 Representation entropy......Page 93
3.6.1 Comparison: Classification and clustering performance......Page 94
3.6.2 Redundancy reduction: Quantitative study......Page 99
3.6.3 Effect of cluster size......Page 100
3.7 Summary......Page 102
4.1 Introduction......Page 103
4.2 Support Vector Machine......Page 106
4.3 Incremental Support Vector Learning with Multiple Points......Page 108
4.4 Statistical Query Model of Learning......Page 109
4.4.2 Confidence factor of support vector set......Page 110
4.5 Learning Support Vectors with Statistical Queries......Page 111
4.6.1 Classification accuracy and training time......Page 114
4.6.3 Margin distribution......Page 117
4.7 Summary......Page 121
5.1 Introduction......Page 123
5.2 Soft Granular Computing......Page 125
5.3 Rough Sets......Page 126
5.3.2 Indiscernibility and set approximation......Page 127
5.3.3 Reducts......Page 128
5.3.4 Dependency rule generation......Page 130
5.4 Linguistic Representation of Patterns and Fuzzy Granulation......Page 131
5.5 Rough-fuzzy Case Generation Methodology......Page 134
5.5.1 Thresholding and rule generation......Page 135
5.5.2 Mapping dependency rules to cases......Page 137
5.5.3 Case retrieval......Page 138
5.6 Experimental Results and Comparison......Page 140
5.7 Summary......Page 141
6.1 Introduction......Page 143
6.2 Clustering Methodologies......Page 144
6.3.2 BIRCH: Balanced iterative reducing and clustering using hierarchies......Page 146
6.3.3 DBSCAN: Density-based spatial clustering of applications with noise......Page 147
6.3.4 STING: Statistical information grid......Page 148
6.4 CEMMiSTRI: Clustering using EM, Minimal Spanning Tree and Rough-fuzzy Initialization......Page 149
6.4.1 Mixture model estimation via EM algorithm......Page 150
6.4.2 Rough set initialization of mixture parameters......Page 151
6.4.3 Mapping reducts to mixture parameters......Page 152
6.4.4 Graph-theoretic clustering of Gaussian components......Page 153
6.5 Experimental Results and Comparison......Page 155
6.6 Multispectral Image Segmentation......Page 159
6.6.4 Experimental results and comparison......Page 161
6.7 Summary......Page 167
7.1 Introduction......Page 168
7.2 Self-Organizing Maps (SOM)......Page 169
7.2.1 Learning......Page 170
7.3 Incorporation of Rough Sets in SOM(RS OM)......Page 171
7.3.2 Mapping rough set rules to network weights......Page 172
7.4.1 Extraction methodology......Page 173
7.4.2 Evaluation indices......Page 174
7.5 Experimental Results and Comparison......Page 175
7.5.1 Clustering and quantization error......Page 176
7.5.2 Performance of rules......Page 181
7.6 Summary......Page 182
8.1 Introduction......Page 183
8.2 Ensemble Classifiers......Page 185
8.3.1.1 Apriori......Page 188
8.3.1.4 Dynamic itemset counting......Page 190
8.4 Classification Rules......Page 191
8.5.1.2 Output representation......Page 193
8.5.2 Rough set knowledge encoding......Page 194
8.6.1 Algorithm......Page 196
8.6.1.1 Steps......Page 197
8.6.2.1 Chromosomal representation......Page 200
8.6.2.4 Choice of fitness function......Page 201
8.7.1 Rule extraction methodology......Page 202
8.7.2 Quantitative measures......Page 206
8.8 Experimental Results and Comparison......Page 207
8.8.1 Classification......Page 208
8.8.2 Rule extraction......Page 210
8.8.2.1 Rules for staging of cervical cancer with binary feature inputs......Page 215
8.9 Summary......Page 217