This book constitutes the refereed proceedings of the 6th Industrial Conference on Data Mining, ICDM 2006, held in Leipzig, Germany in July 2006.
The 45 revised full papers presented were carefully reviewed and selected from 156 submissions. The papers are organized in topical sections on data mining in medicine, Web mining and logfile analysis, theoretical aspects of data mining, data mining in marketing, mining signals and images, and aspects of data mining, what means applications such as intrusion detection, knowledge management, manufacturing process control, time-series mining and criminal investigations.
Author(s): Petra Perner
Series: Lecture Notes in Artificial Intelligence 4065
Edition: 1
Publisher: Springer
Year: 2006
Language: English
Pages: 601
Front matter......Page 1
Introduction......Page 10
Case-Based Reasoning and Prototypicality Measures......Page 11
Diagnosis of Dysmorphic Syndromes......Page 12
Application of Adaptation Rules......Page 14
Results......Page 15
Conclusion......Page 16
References......Page 17
Feature Selection in Tumor Classification......Page 19
Classifier Aggregation for Tumor Classification......Page 20
SMA Version of the DDP-Based Feature Selection Technique......Page 21
OVA Scheme for the DDP-Based Feature Selection Technique......Page 23
Evaluation Techniques......Page 25
Best Averaged Accuracy......Page 26
Size-Averaged Accuracy......Page 27
Class Accuracy......Page 28
Discussion......Page 29
References......Page 31
Introduction......Page 33
Numerical Representation of Biological Sequences......Page 34
Linear Prediction Coefficients......Page 37
LPC Cepstral Coefficients......Page 38
Spectral Distortion Measures......Page 39
LPC Cepstral Distortion......Page 41
Phylogenetic Study of DNA Sequences......Page 42
Database Searching of Similar Sequences......Page 43
Conclusions......Page 44
Introduction......Page 47
Multispecies Gene Entropy......Page 48
Self-Organizing Map Principles......Page 49
Gene Entropy Estimation......Page 50
Gene Entropy Estimation......Page 51
A Species Phylogeny Inference Problem......Page 52
Applications of Gene Entropy in the Phylogenetics......Page 55
Reconstructing Species Tree by K-L Distance Based Gene Concatenation Under Gene Clustering......Page 58
References......Page 60
Introduction......Page 62
Related Works......Page 63
The Unified Approach to Quantify Interestingness of Association Rules......Page 64
Deviation at Intermediate Level......Page 65
Interestingness of Discovered Knowledge......Page 66
Implementation and Experimentation......Page 67
Experiment I......Page 68
References......Page 70
Introduction......Page 73
ConceptMiner System......Page 74
Syntactic Analysis......Page 75
Concept Mapping and Association......Page 76
UMLS Project......Page 77
RelationshipMiner system......Page 78
Evaluation......Page 81
Conclusion......Page 82
References......Page 83
Introduction......Page 85
The Problem of the Rule Extraction from Neural Networks......Page 86
The Basic Concepts of GEX......Page 88
Evolutionary Algorithm......Page 91
The Set of Rules......Page 93
Experimental Studies......Page 94
The Ability of GEX to Describe Classification Made by the Neural Network......Page 95
The Ability of GEX to Acquire New Knowledge......Page 96
Conclusion......Page 98
Introduction and Motivation......Page 100
Current Techniques in Intrusion Detection......Page 102
Available Data and Stream Preprocessing......Page 103
HTTPHunting: An IBR Approach for Diagnosing HTTP Traffic......Page 107
Model Overview......Page 108
System Evaluation......Page 109
Experiments and Results......Page 110
Conclusions and Future Work......Page 111
References......Page 112
Introduction and Motivation......Page 115
Machine Learning Approaches......Page 117
Case-Based and Memory-Based Reasoning Approaches......Page 118
SpamHunting IBR System......Page 119
Benchmark Corpus......Page 120
Preprocessing and Feature Selection......Page 121
Benchmark of the Different Configurations......Page 122
Statistical Analysis of Benchmarking Results......Page 124
Improving SpamHunting Feature Selection Model......Page 125
Conclusions and Further Work......Page 126
References......Page 128
Introduction......Page 130
Different Types of Web Robots......Page 131
Common Detection Methods Based on Log File Characteristics......Page 132
Practical Evaluation of Currently Applied Methods......Page 134
Proposal for a New Robot Discovery Technique......Page 136
Conclusion......Page 138
Introduction......Page 140
The Log File......Page 141
Related Work......Page 142
Software Requirements......Page 144
System Overview......Page 145
Creation of the Initial Database......Page 146
Data Parsing......Page 149
Analysis Obstacles......Page 151
References......Page 153
Introduction......Page 155
The Proposed System......Page 158
Content Extractor......Page 160
Taxonomy Builder......Page 161
Recommendation Manager......Page 163
Conclusions and Future Work......Page 166
References......Page 167
Introduction......Page 170
The Challenge......Page 171
The Mining Plans Selection System......Page 173
Mining Problems Description......Page 175
Presenting a Mining Solution......Page 177
Describing a Mining Experience......Page 179
Knowledge Representation......Page 180
Conclusions and Future Work......Page 182
References......Page 183
Introduction......Page 185
Formal Concept Analysis (FCA)......Page 186
Query Evaluation......Page 188
Architecture......Page 189
From a Web Search Engine Retrieval to a Context......Page 190
Search Preferences and Strategies......Page 191
Stepwise Context Size Reduction Via a Cross Table......Page 193
Visualization of the Concept Hierarchy Using a Line Diagram......Page 194
Conclusion and Discussion......Page 196
Future Work......Page 197
Introduction......Page 200
Basic Concept of the General Concept Lattice and Its Incremental Construction......Page 201
Pruning Based Incremental Construction of Concept Lattice......Page 203
Pruning Based Incremental Construction Algorithm of Concept Lattice Entries......Page 205
Experiment Analysis......Page 207
References......Page 209
Introduction......Page 211
Association Rules......Page 212
Conventional GNP......Page 214
Association Rule Mining Using GNP......Page 215
Extraction of Association Rules......Page 216
Genetic Operators......Page 217
Use of Acquired Information......Page 218
aGNP for Association Rule Mining......Page 219
Extraction of Association Rules......Page 220
Conditions......Page 221
Simulation Results......Page 222
Conclusions......Page 224
Introduction......Page 226
Classification and Regression Trees......Page 228
Ordinal Classification......Page 229
Monotone Classification Trees......Page 231
Ordinal Classification with Monotonicity Constraints......Page 232
Conclusions......Page 233
References......Page 234
Introduction......Page 235
Algorithm......Page 238
Validation of the Principle......Page 239
A First Example......Page 240
Differing Variances......Page 241
Real World Data......Page 242
Summary......Page 245
Data Preparation and Feature Ranking......Page 248
When Features Are Dynamic......Page 250
Evaluation of Dynamic Features......Page 251
Evaluation of Sets of Prototypes......Page 253
The Optimization Heuristic......Page 254
Application......Page 255
Conclusion and Further Work......Page 257
Classification with Neural Networks......Page 259
Multi SOM Principle......Page 260
M-SOMs for Classification......Page 262
Neural Gas......Page 263
Multi-Neural Gas Principle......Page 264
Quality Measures......Page 265
Results of Experiments......Page 267
Quality Measures for Faulty Clustering......Page 269
Conclusions......Page 271
References......Page 272
Introduction......Page 273
Discrete Gradient Method......Page 275
Evolutionary Discrete Gradient Method......Page 280
Test Problems......Page 281
Results......Page 283
Convergence......Page 285
References......Page 286
Introduction......Page 288
Enterprise Customer Management System......Page 289
Data-Mining Application Analysis......Page 290
References......Page 292
Introduction......Page 293
Related Work......Page 294
Evaluation Criterion......Page 295
Search Strategy......Page 296
Experiments......Page 297
Experimentation Details......Page 298
Experimental Results......Page 299
Conclusions and Future Work......Page 304
References......Page 305
Introduction......Page 306
When Everybody Tastes Everything: Using Explicit Rankings......Page 308
In a General Case: Using Ranking Functions......Page 309
The Clustering Algorithm......Page 311
The Preference Function of Groups......Page 312
Experimental Results......Page 313
Implications for Lamb Markets and Breeders......Page 314
Conclusions......Page 316
References......Page 317
Introduction......Page 319
MultiNomial Logit......Page 320
Random Feature Selection in MNL (rfs_MNL)......Page 321
Predictive Model Evaluation: wPCC and AUC......Page 323
MultiNomial Logit (MNL)......Page 325
Random Feature Selection in MNL (rfs_MNL)......Page 326
MNL with Expert Feature Selection (efs_MNL)......Page 327
Predictive Model Evaluation on Test Data......Page 328
Feature Importance in NPTB Model......Page 329
Conclusion......Page 330
References......Page 331
Introduction......Page 333
Data Understanding and Data Preparation......Page 334
Setting-Up of Service Profile of Families: Association Rules......Page 336
Setting-Up of Expenditure Profile of Families: Factor and Cluster Analysis......Page 337
Services Usage and Family Characteristics: Correspondence Analysis......Page 339
Expenditures and Family Characteristics: Decision Tree......Page 340
Three-Ways Analysis: Canonical Correspondence Analysis......Page 341
Services Usage and Family Characteristics......Page 342
Three-Ways Analysis......Page 343
Conclusion......Page 344
References......Page 345
Introduction......Page 346
Subgroup Discovery......Page 347
Multiobjective Genetic Algorithms......Page 349
A Multiobjective Evolutionary Approach to Obtain Descriptive Fuzzy Rules......Page 350
Chromosome Representation......Page 351
Fitness Assignment......Page 352
A Case Study in Marketing: Knowledge Discovery in Trade Fairs......Page 353
Results of the Experimentation on the Marketing Dataset......Page 354
References......Page 357
Introduction......Page 359
Scatter Search Algorithm......Page 360
Subset Generation Method......Page 361
Diversification Generation Method (DGM)......Page 362
Improvement Method......Page 364
Reference Set......Page 366
Solution Combination Method......Page 367
Fitness Function......Page 368
Testing Methodology......Page 369
Original Size(OSsize) and Refset Size......Page 370
Comparative Study......Page 371
Conclusion......Page 372
Introduction......Page 374
Support Vector Machine Classification......Page 376
The Expected Generalization Error Bounds......Page 377
Description of NSGA-II......Page 378
Genetic Operator......Page 379
Stopping Criterion......Page 380
Experiment Result and Evaluation......Page 381
Conclusion and Future Work......Page 384
Introduction......Page 386
Decision Model Analysis......Page 387
Experimental Setup......Page 391
Experimental Results......Page 393
Conclusions......Page 395
Introduction......Page 398
Background......Page 399
Collection and Preparation of Plant Samples......Page 400
Measurement of Plant Spectra......Page 401
Classification Models......Page 402
Individual Species Classification......Page 404
Distinguishing Salt Tolerant from Salt Sensitive Species......Page 407
Analysis of Variance and Discriminant Analysis......Page 408
Conclusion and Future Work......Page 410
Introduction......Page 413
The Proposed Model......Page 414
Recommendation......Page 417
Maximum Likelihood Estimation......Page 418
The Data Set......Page 419
Evaluation of Rating'S Prediction Accuracy......Page 420
Evaluation in a CBIR System......Page 421
Conclusion......Page 423
Introduction......Page 425
Hidden Markov Model and Objective Function for Re- Estimation of Its Parameter......Page 427
Detail Methodology of Variable Initialization Approaches to EM Algorithm (VIA-EM)......Page 428
First Step (Creation of Multiple Initial Models)......Page 429
Second Step (Creation of Several HMM Models and Model Selection Using Binary Search)......Page 432
Settings......Page 434
Discussion......Page 437
References......Page 438
Introduction......Page 440
The Videos......Page 442
Simulating the Colour Blindness......Page 443
Colour Space Conversion......Page 444
Measuring Colour Change Between Video Frames......Page 445
Colour Contrast......Page 446
Colour Change Between Video Frames......Page 448
Conclusions......Page 450
Introduction......Page 453
Related Work......Page 454
Feature Extraction......Page 455
Feature Selection......Page 456
Experiment Settings......Page 458
Instrument Tone Classification......Page 459
Solo Instrument Detection......Page 461
Discussion......Page 464
Conclusion......Page 465
Introduction......Page 468
Imaging Systems......Page 469
Classes......Page 470
Image Pre-processing......Page 472
Analysis of the Crystallisation Drop......Page 476
Classification......Page 477
Results and Discussion......Page 478
Conclusions......Page 480
References......Page 481
Introduction......Page 483
Problem Statement......Page 484
Equivalence Class......Page 485
The Frequent Itemset Enumeration Tree......Page 487
Building the Frequent Itemset Enumeration Tree......Page 489
Maintenance of FIET......Page 490
Synthetic Datasets......Page 496
Conclusion......Page 499
References......Page 500
Introduction......Page 501
Problem Statements......Page 503
Evaluation of Single Sequences......Page 504
A Sequence Search Algorithm......Page 506
Simulation Results......Page 508
Applying Key Sequences in Probabilistic Reasoning......Page 509
References......Page 512
Introduction......Page 515
Data Alignment......Page 516
Alignment Procedure......Page 517
Conclusions and Future Work......Page 518
Introduction......Page 520
Project Layout......Page 521
System Architecture......Page 522
Entity Extraction and Table Transformation......Page 523
Multi-dimensional Transformation......Page 524
New Distance Measure......Page 526
Visualization......Page 529
Experimental Results......Page 531
Conclusion and Future Directions......Page 532
Introduction......Page 535
Definition of Signature......Page 537
Elements of a Signature......Page 538
Anomaly Detection......Page 539
Distance Between Feature Variables......Page 540
Pseudo-algorithm......Page 541
Detecting Anomalies......Page 542
Evaluating Alarms......Page 543
Related Work......Page 544
Conclusions and Future Work......Page 545
Introduction and Motivation......Page 548
Integration of Agent Technology, Case-Based-Reasoning, Experience-Factory, and Sofware Product-Lines......Page 550
Vision......Page 553
Introduction......Page 557
Density Estimation for Exploring Data......Page 558
Properties of Density Estimates......Page 559
Feature Selection: A Nonparametric Smoothing Approach......Page 561
Kernel Density Estimation and Unsupervised Classification......Page 563
Application......Page 565
Conclusion......Page 568
Introduction......Page 570
Hybrid Learning......Page 573
Individual (Genome) Presentation......Page 574
Evolution Strategies Learning Procedure......Page 576
Case Study: Coil-Spring Manufacturing Process......Page 577
Comparison of Accuracy Rate......Page 579
Decision Network......Page 580
Select an Appropriate Maintenance Action Using AHP......Page 582
Conclusion......Page 583
References......Page 584
Introduction......Page 585
Graph Matching Preliminaries......Page 588
Multidimensional Scaling (MDS)......Page 589
Combining Graph Matching and MDS for Network Behaviour Visualisation......Page 590
Experimental Results......Page 592
Conclusions, Discussion and Future Work......Page 597
Back matter......Page 600