This book constitutes the refereed proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2007, held in Warsaw, Poland, in September 2007, co-located with ECML 2007, the 18th European Conference on Machine Learning.
The 28 revised full papers and 35 revised short papers presented together with abstracts of 4 invited talks were carefully reviewed and selected from 592 papers submitted to both, ECML and PKDD. The papers present original results on leading-edge subjects of knowledge discovery from conventional and complex data and address all current issues in the area.
Author(s): Joost N. Kok
Series: Lecture Notes in Artificial Intelligence 4702
Edition: 1
Publisher: Springer
Year: 2007
Language: English
Pages: 659
Front matter......Page 1
Learning, Information Extraction and theWeb......Page 21
Putting Things in Order: On the Fundamental Role of Ranking in Classification and Probability Estimation......Page 22
Mining Queries......Page 24
Adventures in Personalized Information Access......Page 25
Introduction......Page 26
Experiment Databases......Page 27
Dataset......Page 29
Populating the Database......Page 30
A Relational Experiment Database......Page 32
Querying and Mining......Page 33
Conclusions......Page 36
References......Page 37
Introduction......Page 38
Motivation......Page 39
Extraction from the Web......Page 40
Extraction from Wikipedia......Page 42
Datasets......Page 43
Results......Page 45
Related Work......Page 47
References......Page 48
Introduction......Page 50
The Visual Content Context Flexible Mixture Model......Page 52
Model Selection and Parameter Estimation Using MML......Page 53
Estimation of Parameters......Page 56
Data Set......Page 57
Second Experiment: Comparison with State-of-the-Art......Page 58
References......Page 60
Introduction......Page 62
Area Under the Curve and Classification......Page 63
Polynomial Approximation of AUC and Soft-AUC......Page 64
Training a Linear Classifier with the Approximation......Page 66
Experimental Evaluation......Page 69
References......Page 73
Introduction......Page 74
A Pipeline to Identify Relevant and Novel Proteins......Page 75
Learning a Classifier to Determine Relevance......Page 77
Discussion......Page 84
References......Page 85
Introduction......Page 87
Related Work......Page 88
Graph Representation of Web Documents......Page 89
Preliminary Terminology......Page 90
Web Document Classification Algorithm......Page 91
Experiments and Results......Page 93
References......Page 97
Introduction......Page 99
CSI Mixture Models......Page 100
Dirichlet Mixture Priors......Page 102
Prior Parameter Derivation......Page 103
Feature Ranking......Page 105
L-Lactate Dehydrogenase Family......Page 106
Nucleotidyl Cyclase Family......Page 107
Discussion......Page 108
References......Page 109
Introduction and Motivation......Page 111
Finding Overlapping Clusters......Page 112
Results......Page 116
Related Work......Page 119
Conclusions......Page 121
References......Page 122
Introduction......Page 123
Randomization Procedure......Page 124
Accuracy of Association Rule......Page 125
Accuracy Analysis of Measures......Page 126
Variances of Derived Measures......Page 127
Interquantile Ranges of Derived Measures......Page 128
Measures Derived from the Randomized Data Without p......Page 129
Related Work......Page 131
References......Page 132
Introduction......Page 135
Design and Definitions......Page 136
Related Work......Page 137
Online Data Stream Symbolization (ODSS)......Page 138
Experimental Scenario......Page 141
Data Collection and Preparation......Page 142
Experimental Method and Results......Page 143
References......Page 145
Introduction......Page 147
Related Work......Page 149
Social Network Features......Page 150
Defining a Node Set......Page 151
Operation on a Node Set......Page 152
Aggregation of Values......Page 153
Datasets and Task......Page 154
Experimental Results......Page 155
Discussion......Page 157
References......Page 158
Introduction......Page 160
KD-Trees......Page 161
Metric Trees......Page 162
Cover Trees......Page 163
Empirical Comparison of the Data Structures......Page 164
Results......Page 166
Conclusions......Page 170
References......Page 171
Introduction......Page 172
The Same-Site Assumption......Page 173
Bridging the Gap Between Site-Dependent and Site-Independent Approaches......Page 174
Template Detection Using the Crawl-Sample Dataset......Page 176
Application to Cross-Domain Keyword Extraction......Page 178
Impact of Template-Blocks on Keyword Extraction......Page 179
Experiments......Page 180
Conclusions......Page 182
References......Page 183
Introduction......Page 184
Classical Variable Precision Rough Set Approach......Page 185
Dominance-Based Rough Set Approach (DRSA)......Page 186
Statistical Model of Variable Consistency in DRSA......Page 187
Isotonic Regression......Page 188
Minimal Reassignment Problem......Page 189
Connection Between IRP and MRP......Page 190
Summary of the Statistical Model for DRSA......Page 191
Decision-Theoretical View......Page 192
Conclusions......Page 194
References......Page 195
Introduction......Page 196
Anti-learnable Signature of a 0-Sum Game......Page 197
The i.i.d. Learning Curves......Page 199
IID Anti-learning Theorem......Page 200
Examples of Anti-learning in Natural Data......Page 203
Discussion......Page 204
Conclusions......Page 206
References......Page 207
Introduction......Page 208
Our Motivation: Naïve Bayes Classifier......Page 209
Related Work on Discretization......Page 210
Fayyad and Irani's Recursive Discretization: ent-mdl......Page 212
Theoretical and Empirical Properties of ent-mdl......Page 213
Simple Bayesian Methods in Discretization: ent-bay......Page 214
Using a Test Statistic to Decide on the Splits......Page 215
Empirical Evaluation......Page 216
Conclusions......Page 217
References......Page 218
Introduction......Page 220
Markov Logic......Page 221
Voted Perceptron......Page 222
Per-weight Learning Rates......Page 223
Diagonal Newton......Page 224
Scaled Conjugate Gradient......Page 225
Datasets......Page 226
Results......Page 228
Conclusion......Page 230
References......Page 231
Introduction......Page 232
Case Study: Cognitive State Classification Using Functional Magnetic Resonance Images......Page 233
Gaussian Naive Bayes......Page 234
Standard Hierarchical Bayesian Model......Page 235
Feature Sharing Empirical Bayesian Model......Page 236
Case Study of Feature Sharing with fMRI Data......Page 237
Feature Sharing Empirical Bayesian Model for fMRI......Page 238
Experimental Results......Page 240
Conclusion and Future Work......Page 242
References......Page 243
Introduction......Page 244
Related Work......Page 245
The Basic Learning Model......Page 246
Domain Adaptation......Page 247
Distance Function......Page 248
Training Algorithm......Page 249
Experiments......Page 250
Overall Improvement with Domain Adaptation......Page 252
Comparison with Other Methods......Page 253
Conclusion......Page 254
References......Page 255
Introduction......Page 256
Traffic and Incident Data......Page 257
Performance Metrics......Page 258
Model Learning and Features......Page 259
SVM Detector......Page 260
A Model of Incident Sequences......Page 261
Inference for Realignment......Page 262
Experimental Evaluation......Page 263
Detection and Alignment Model Specifics......Page 264
Experimental Results......Page 265
References......Page 266
Introduction......Page 268
Our Contribution......Page 269
Related Work......Page 270
Definition of Informativeness......Page 271
Informativeness Implementation......Page 272
LI-KNN Classification......Page 274
GI-KNN Classification......Page 275
Algorithm and Analysis......Page 276
Learning the Weight Vector......Page 277
Experiments......Page 278
UCI Benchmark Corpus......Page 279
Application to Text Categorization......Page 280
Object Recognition on COIL-20......Page 281
Discussion......Page 282
References......Page 283
Introduction......Page 285
Basic Definitions......Page 286
Finding Outlying Items......Page 287
Testing the Significance of Outlying Items......Page 288
The swap-pairs Algorithm......Page 289
Theoretical Questions and Convergence Diagnostics......Page 291
Data Sets......Page 292
Results......Page 293
References......Page 296
Relevance Filtering......Page 297
Optimality of Marginal Feature Relevance......Page 298
Permutation Testing......Page 300
Search Procedure......Page 301
Experiments......Page 302
Discussion......Page 304
Conclusions......Page 306
References......Page 307
Introduction......Page 308
Algorithm......Page 309
TCM-kNN......Page 310
Algorithm......Page 311
ROC-kNN......Page 312
Experimental Setup......Page 313
Quality Assessment of TCM-kNN and ROC-kNN......Page 314
Discussion......Page 315
Conclusions......Page 318
References......Page 319
Introduction......Page 320
Motivating Example......Page 322
Schema Theory......Page 323
Data Specification......Page 324
DM Algorithm and Model Specification......Page 325
Schema Unification......Page 326
Generic Evaluation......Page 328
Conclusion and Future Work......Page 330
References......Page 331
Introduction......Page 332
Formalization of the Task......Page 333
Existing Methods for Sequence Labeling: Local Output Dependencies......Page 334
Existing Methods for Sequence Labeling: Long-Term Output Dependencies......Page 335
Local Classifier......Page 336
Relaxation Labeling......Page 337
Learning the Constraints......Page 338
Tasks and Corpora......Page 339
Results......Page 341
References......Page 342
Introduction......Page 344
Related Work......Page 346
Refinement......Page 347
Bridged Refinements......Page 349
Performance......Page 350
Parameter Sensitivity......Page 352
Conclusion and Future Work......Page 353
References......Page 354
Introduction......Page 356
Visual Cluster Analysis......Page 357
Star Coordinates......Page 358
HOV......Page 359
Multiple HOV^3 Projection (M-HOV^3)......Page 360
The Enhanced Separation Feature of M-HOV^3......Page 361
Predictive Cluster Exploration by M-HOV^3......Page 363
Predictive Cluster Exploration by HOV^3 with Statistical Measurements......Page 364
Predictive Cluster Validation by HOV^3......Page 366
Related Work......Page 367
References......Page 368
Introduction......Page 370
Related Work and Motivation......Page 371
Construction......Page 372
Graph Clustering......Page 373
Dimensionality......Page 374
Experimental Setting......Page 375
Conclusion and Future Work......Page 376
References......Page 377
Introduction......Page 378
Image Division and Tagging......Page 380
Choosing the Sub-Image Classification Threshold......Page 381
Classifier......Page 382
Results on the Full Dataset......Page 383
Conclusions......Page 384
References......Page 385
Introduction......Page 386
Biased-Box Sampling......Page 387
Experimental Results......Page 390
Conclusions......Page 392
References......Page 393
Introduction......Page 394
Probabilistic Framework to Estimate Players' Strengths......Page 395
Expectation Propagation......Page 396
Accuracy......Page 398
Confidence......Page 399
Conclusions......Page 400
References......Page 401
Introduction......Page 402
Preliminaries......Page 403
The General Problem......Page 404
Applications......Page 406
Mining Closed Frequent Connected Subgraphs......Page 407
Closed Frequent Subpath Mining......Page 408
References......Page 409
Introduction......Page 410
Problem Definition......Page 411
Relational EPs Discovery......Page 413
Experimental Results......Page 415
References......Page 417
Introduction......Page 418
A Closer Look at QUEST......Page 420
An Alternative Proposal......Page 423
References......Page 425
Introduction......Page 426
Generating New Functions......Page 427
Non-linear Scaling......Page 428
Morphological Scaling......Page 429
Experiments......Page 431
References......Page 433
Introduction......Page 434
Warping Time for Feature Extraction......Page 435
Time Series Dataset Generator Description......Page 437
Experimental Results......Page 438
Conclusion......Page 440
References......Page 441
Related Works......Page 442
Our Approach......Page 443
Ants......Page 444
Cluster Agents......Page 445
Results......Page 446
Conclusion......Page 448
References......Page 449
Introduction......Page 450
Classification Rules with Aggregation Predicates......Page 451
Integrating Single-feature Aggregation Functions into ILP Rules......Page 452
Learning Rules with Aggregation Predicates......Page 453
Extending TupleID Propagation for Aggregation......Page 454
Experimental Results......Page 455
References......Page 457
Introduction......Page 458
Problem Statement......Page 459
Measuring Surprise as an Interestingness Measure......Page 460
The MINI Algorithm for Mining Informative Non-redundant Itemsets......Page 461
Experiments and Results......Page 463
References......Page 465
Motivation......Page 466
General Description......Page 467
Experimental Evaluation......Page 471
Conclusions and Future Issues......Page 472
References......Page 473
Introduction......Page 474
Classification Models......Page 475
Hidden Database Probing......Page 476
HW-DB Classification Based on Link Structure......Page 477
Experiment......Page 478
Determining the Number of the Feature Terms for Form Filling-Out......Page 479
Conclusions......Page 480
References......Page 481
Introduction......Page 482
The SESP Pruning Approach......Page 483
Subgraph Evaluation......Page 484
Experimental Results......Page 486
References......Page 489
Introduction......Page 491
Problem Definition and Complexity......Page 492
Series-Parallel Graphs......Page 494
General Graphs......Page 495
Experiments......Page 496
Conclusions......Page 497
References......Page 498
Introduction......Page 499
Selection of the Clustering Algorithm......Page 500
Extraction of Clusters......Page 503
Evaluating the Stability......Page 504
Example and Conclusion......Page 505
References......Page 506
Introduction......Page 507
Related Work......Page 508
R-U Confidentiality Map......Page 509
Utility......Page 510
Risk......Page 511
Optimization of RR for Binary Data......Page 512
Rules and Constraint......Page 513
Algorithm......Page 514
Experiments......Page 515
References......Page 516
Introduction......Page 518
Dialect Word Data......Page 519
Subsequent Analysis......Page 521
Discussion......Page 523
References......Page 525
Introduction......Page 526
Recommending Tags---Problem Definition and State of the Art......Page 527
Collaborative Filtering......Page 528
Evaluation......Page 529
Results......Page 531
References......Page 533
Introduction......Page 535
Related Work......Page 536
Partitioned Data-Based PPCF Using NBC......Page 537
Overhead Costs and Privacy Analysis......Page 539
Experiments......Page 540
References......Page 542
Introduction......Page 543
Multi-party PPDM as Games......Page 544
Case Study: Multi-party Secure Sum Computation......Page 545
Achieving Nash Equilibrium with No-Colluding Nodes......Page 547
Experimental Results......Page 549
References......Page 550
Introduction......Page 552
Clustering XML Documents......Page 553
Multilevel Clustering of XML Document Structure......Page 554
Multilevel Conditional Fuzzy C-Means......Page 555
Partition Consistency Verification......Page 556
Dataset Reduction Verification......Page 557
References......Page 559
Introduction......Page 560
Fraud Detection......Page 561
Fraud Auditing Objective......Page 562
Algorithm......Page 563
Experiment 1......Page 565
Experiment 2......Page 566
References......Page 567
Introduction......Page 568
Skin Color Segmentation TSL Space Color......Page 569
Testing......Page 570
Statistical Skin Color Selection......Page 571
Fuzzy ART Real-Time Testing......Page 572
Conclusions......Page 574
References......Page 575
Introduction......Page 576
Preliminaries......Page 577
Algorithm-1: An Exact Algorithm for Prototype Selection......Page 579
The General Scenario......Page 580
Experimental Results......Page 581
Conclusions and Future Work......Page 583
References......Page 584
Introduction......Page 585
Continuous Wavelet Transform......Page 586
Relating the CWT Coefficients in a DBN......Page 587
Learning, Classification and Confidence......Page 589
Experiments......Page 590
Conclusion......Page 591
References......Page 592
Introduction......Page 593
Robust GTM in the Presence of Measurement Errors......Page 594
Structured Variational EM Solution......Page 595
Deriving Interpretations from the Model......Page 596
Experiments and Results......Page 597
References......Page 600
Introduction......Page 601
Centroid Classifier......Page 602
Proposed Technique......Page 603
Comparison with Other Methods......Page 606
Training Margin and Performance vs. MaxIteration......Page 607
References......Page 608
Introduction......Page 609
CoreWar......Page 610
Warrior Representation......Page 611
Warrior Categorization......Page 613
References......Page 616
Introduction......Page 617
Relevance Functions......Page 618
Cost and Benefit Surfaces......Page 619
An Illustrative Application......Page 621
Conclusions......Page 623
References......Page 624
Introduction......Page 625
Related Work......Page 626
Multi-label Lazy Associative Classification......Page 627
Independent Classifiers......Page 628
Correlated Classifiers......Page 629
Experimental Evaluation......Page 630
References......Page 632
Introduction......Page 633
Comparing Trajectories......Page 634
Spanning-Tree Visualization......Page 635
Extensions for Non-metric Distances......Page 636
Experimental Results......Page 637
Conclusions......Page 639
References......Page 640
Introduction......Page 641
Inherent Ordering Among the Measures......Page 643
Empirical Evaluation......Page 646
Related Work......Page 647
References......Page 648
Introduction......Page 649
On Semantic Classification......Page 650
Dates......Page 651
Feature Extraction and Construction......Page 653
Experiments......Page 654
References......Page 656
Back matter......Page 658