This book is about inductive databases and constraint-based data mining, emerging research topics lying at the intersection of data mining and database research. The aim of the book as to provide an overview of the state-of- the art in this novel and - citing research area. Of special interest are the recent methods for constraint-based mining of global models for prediction and clustering, the uni?cation of pattern mining approaches through constraint programming, the clari?cation of the re- tionship between mining local patterns and global models, and the proposed in- grative frameworks and approaches for inducive databases. On the application side, applications to practically relevant problems from bioinformatics are presented. Inductive databases (IDBs) represent a database view on data mining and kno- edge discovery. IDBs contain not only data, but also generalizations (patterns and models) valid in the data. In an IDB, ordinary queries can be used to access and - nipulate data, while inductive queries can be used to generate (mine), manipulate, and apply patterns and models. In the IDB framework, patterns and models become ”?rst-class citizens” and KDD becomes an extended querying process in which both the data and the patterns/models that hold in the data are queried.
Author(s): Sašo Džeroski (auth.), Sašo Džeroski, Bart Goethals, Panče Panov (eds.)
Edition: 1
Publisher: Springer-Verlag New York
Year: 2010
Language: English
Pages: 456
Tags: Database Management; Data Mining and Knowledge Discovery; Artificial Intelligence (incl. Robotics); Computational Biology/Bioinformatics
Front Matter....Pages 1-15
Front Matter....Pages 1-1
Inductive Databases and Constraint-based Data Mining: Introduction and Overview....Pages 3-26
Representing Entities in the OntoDM Data Mining Ontology....Pages 27-58
A Practical Comparative Study Of Data Mining Query Languages....Pages 59-77
A Theory of Inductive Query Answering....Pages 79-103
Front Matter....Pages 105-105
Generalizing Itemset Mining in a Constraint Programming Setting....Pages 107-126
From Local Patterns to Classification Models....Pages 127-154
Constrained Predictive Clustering....Pages 155-175
Finding Segmentations of Sequences....Pages 177-197
Mining Constrained Cross-Graph Cliques in Dynamic Networks....Pages 199-228
Probabilistic Inductive Querying Using ProbLog....Pages 229-262
Front Matter....Pages 263-263
Inductive Querying with Virtual Mining Views....Pages 265-287
SINDBAD and SiQL: Overview, Applications and Future Developments....Pages 289-309
Patterns on Queries....Pages 311-334
Experiment Databases....Pages 335-361
Front Matter....Pages 363-363
Predicting Gene Function using Predictive Clustering Trees....Pages 365-387
Analyzing Gene Expression Data with Predictive Clustering Trees....Pages 389-406
Using a Solver Over the String Pattern Domain to Analyze Gene Promoter Sequences....Pages 407-423
Inductive Queries for a Drug Designing Robot Scientist....Pages 425-451
Back Matter....Pages 454-457