Advances in Knowledge Discovery and Data Mining: 8th Pacific-Asia Conference, PAKDD 2004, Sydney, Australia, May 26-28, 2004. Proceedings

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

ThePaci?c-AsiaConferenceonKnowledgeDiscoveryandDataMining(PAKDD) has been held every year since 1997. This year, the eighth in the series (PAKDD 2004) was held at Carlton Crest Hotel, Sydney, Australia, 26–28 May 2004. PAKDD is a leading international conference in the area of data mining. It p- vides an international forum for researchers and industry practitioners to share their new ideas, original research results and practical development experiences from all KDD-related areas including data mining, data warehousing, machine learning, databases, statistics, knowledge acquisition and automatic scienti?c discovery, data visualization, causal induction, and knowledge-based systems. The selection process this year was extremely competitive. We received 238 researchpapersfrom23countries,whichisthehighestinthehistoryofPAKDD, and re?ects the recognition of and interest in this conference. Each submitted research paper was reviewed by three members of the program committee. F- lowing this independent review, there were discussions among the reviewers, and when necessary, additional reviews from other experts were requested. A total of 50 papers were selected as full papers (21%), and another 31 were selected as short papers (13%), yielding a combined acceptance rate of approximately 34%. The conference accommodated both research papers presenting original - vestigation results and industrial papers reporting real data mining applications andsystemdevelopmentexperience.Theconferencealsoincludedthreetutorials on key technologies of knowledge discovery and data mining, and one workshop focusing on speci?c new challenges and emerging issues of knowledge discovery anddatamining.ThePAKDD2004programwasfurtherenhancedwithkeynote speeches by two outstanding researchers in the area of knowledge discovery and data mining: Philip Yu, Manager of Software Tools and Techniques, IBM T.J.

Author(s): Philip S. Yu (auth.), Honghua Dai, Ramakrishnan Srikant, Chengqi Zhang (eds.)
Series: Lecture Notes in Computer Science 3056 : Lecture Notes in Artificial Intelligence
Edition: 1
Publisher: Springer-Verlag Berlin Heidelberg
Year: 2004

Language: English
Pages: 716
Tags: Artificial Intelligence (incl. Robotics); Database Management; Information Storage and Retrieval; Multimedia Information Systems; Probability and Statistics in Computer Science; Business Information Systems

Front Matter....Pages -
Mining of Evolving Data Streams with Privacy Preservation....Pages 1-1
Data Mining Grand Challenges....Pages 2-2
Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms....Pages 3-12
Spectral Energy Minimization for Semi-supervised Learning....Pages 13-21
Discriminative Methods for Multi-labeled Classification....Pages 22-30
Subspace Clustering of High Dimensional Spatial Data with Noises....Pages 31-40
Constraint-Based Graph Clustering through Node Sequencing and Partitioning....Pages 41-51
Mining Expressive Process Models by Clustering Workflow Traces....Pages 52-62
CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees....Pages 63-73
Secure Association Rule Sharing....Pages 74-85
Self-Similar Mining of Time Association Rules....Pages 86-95
ParaDualMiner: An Efficient Parallel Implementation of the DualMiner Algorithm....Pages 96-105
A Novel Distributed Collaborative Filtering Algorithm and Its Implementation on P2P Overlay Network....Pages 106-115
An Efficient Algorithm for Dense Regions Discovery from Large-Scale Data Streams....Pages 116-120
Blind Data Linkage Using n -gram Similarity Comparisons....Pages 121-126
Condensed Representation of Emerging Patterns....Pages 127-132
Discovery of Maximally Frequent Tag Tree Patterns with Contractible Variables from Semistructured Documents....Pages 133-144
Mining Term Association Rules for Heuristic Query Construction....Pages 145-154
FP-Bonsai: The Art of Growing and Pruning Small FP-Trees....Pages 155-160
Mining Negative Rules Using GRD....Pages 161-165
Applying Association Rules for Interesting Recommendations Using Rule Templates....Pages 166-170
Feature Extraction and Classification System for Nonlinear and Online Data....Pages 171-180
A Metric Approach to Building Decision Trees Based on Goodman-Kruskal Association Index....Pages 181-190
DRC-BK: Mining Classification Rules with Help of SVM....Pages 191-195
A New Data Mining Method Using Organizational Coevolutionary Mechanism....Pages 196-200
Noise Tolerant Classification by Chi Emerging Patterns....Pages 201-206
The Application of Emerging Patterns for Improving the Quality of Rare-Class Classification....Pages 207-211
Finding Negative Event-Oriented Patterns in Long Temporal Sequences....Pages 212-221
OBE: Outlier by Example....Pages 222-234
Temporal Sequence Associations for Rare Events....Pages 235-239
Summarization of Spacecraft Telemetry Data by Extracting Significant Temporal Patterns....Pages 240-244
An Extended Negative Selection Algorithm for Anomaly Detection....Pages 245-254
Adaptive Clustering for Network Intrusion Detection....Pages 255-259
Ensembling MML Causal Discovery....Pages 260-271
Logistic Regression and Boosting for Labeled Bags of Instances....Pages 272-281
Fast and Light Boosting for Adaptive Mining of Data Streams....Pages 282-292
Compact Dual Ensembles for Active Learning....Pages 293-297
On the Size of Training Set and the Benefit from Ensemble....Pages 298-307
Identifying Markov Blankets Using Lasso Estimation....Pages 308-318
Selective Augmented Bayesian Network Classifiers Based on Rough Set Theory....Pages 319-328
Using Self-Consistent Naive-Bayes to Detect Masquerades....Pages 329-340
DB-Subdue: Database Approach to Graph Mining....Pages 341-350
Finding Frequent Structural Features among Words in Tree-Structured Documents....Pages 351-360
Exploring Potential of Leave-One-Out Estimator for Calibration of SVM in Text Mining....Pages 361-372
Classifying Text Streams in the Presence of Concept Drifts....Pages 373-383
Using Cluster-Based Sampling to Select Initial Training Set for Active Learning in Text Classification....Pages 384-388
Spectral Analysis of Text Collection for Similarity-Based Clustering....Pages 389-393
Clustering Multi-represented Objects with Noise....Pages 394-403
Providing Diversity in K-Nearest Neighbor Query Results....Pages 404-413
Cluster Structure of K -means Clustering via Principal Component Analysis....Pages 414-418
Combining Clustering with Moving Sequential Pattern Mining: A Novel and Efficient Technique....Pages 419-423
An Alternative Methodology for Mining Seasonal Pattern Using Self-Organizing Map....Pages 424-430
ISM: Item Selection for Marketing with Cross-Selling Considerations....Pages 431-440
Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining....Pages 441-451
Mining Association Rules from Structural Deltas of Historical XML Documents....Pages 452-457
Data Mining Proxy: Serving Large Number of Users for Efficient Frequent Itemset Mining....Pages 458-463
Formal Approach and Automated Tool for Translating ER Schemata into OWL Ontologies....Pages 464-475
Separating Structure from Interestingness....Pages 476-485
Exploiting Recurring Usage Patterns to Enhance Filesystem and Memory Subsystem Performance....Pages 486-496
Automatic Text Extraction for Content-Based Image Indexing....Pages 497-507
Peculiarity Oriented Analysis in Multi-people Tracking Images....Pages 508-518
AutoSplit: Fast and Scalable Discovery of Hidden Variables in Stream and Multimedia Databases....Pages 519-528
Semantic Sequence Kin: A Method of Document Copy Detection....Pages 529-538
Extracting Citation Metadata from Online Publication Lists Using BLAST....Pages 539-548
Mining of Web-Page Visiting Patterns with Continuous-Time Markov Models....Pages 549-558
Discovering Ordered Tree Patterns from XML Queries....Pages 559-563
Predicting Web Requests Efficiently Using a Probability Model....Pages 564-568
CCMine: Efficient Mining of Confidence-Closed Correlated Patterns....Pages 569-579
A Conditional Probability Distribution-Based Dissimilarity Measure for Categorial Data....Pages 580-589
Learning Hidden Markov Model Topology Based on KL Divergence for Information Extraction....Pages 590-594
A Non-parametric Wavelet Feature Extractor for Time Series Classification....Pages 595-603
Rules Discovery from Cross-Sectional Short-Length Time Series....Pages 604-614
Constraint-Based Mining of Formal Concepts in Transactional Data....Pages 615-624
Towards Optimizing Conjunctive Inductive Queries....Pages 625-637
Febrl – A Parallel Open Source Data Linkage System....Pages 638-647
A General Coding Method for Error-Correcting Output Codes....Pages 648-652
Discovering Partial Periodic Patterns in Discrete Data Sequences....Pages 653-658
Conceptual Mining of Large Administrative Health Data....Pages 659-669
A Semi-automatic System for Tagging Specialized Corpora....Pages 670-681
A Tree-Based Approach to the Discovery of Diagnostic Biomarkers for Ovarian Cancer....Pages 682-691
A Novel Parameter-Less Clustering Method for Mining Gene Expression Data....Pages 692-698
Extracting and Explaining Biological Knowledge in Microarray Data....Pages 699-703
Further Applications of a Particle Visualization Framework....Pages 704-710
Back Matter....Pages -