This volume contains the papers presented at the inaugural workshop on Data Mining and Bioinformatics at the 32nd International Conference on Very Large Data Bases (VLDB). The purpose of this workshop was to begin bringing - gether researchersfrom database, data mining, and bioinformatics areas to help leverage respective successes in each to the others. We also hope to expose the richness, complexity, and challenges in this area that involves mining very large complex biological data that will only grow in size and complexity as geno- scale high-throughput techniques become more routine. The problems are s- ?ciently di?erent enough from traditional data mining problems (outside of life sciences) that novel approaches must be taken to data mine in this area. The workshop was held in Seoul, Korea, on September 11, 2006. We received 30 submissions in response to the call for papers. Each subm- sion was assigned to at least three members of the Program Committee. The Program Committee discussed the submission electronically, judging them on their importance, originality, clarity, relevance, and appropriateness to the - pected audience. The Program Committee selected 15 papers for presentation. These papers arein the areasof microarraydata analysis, bioinformaticssystem and text retrieval, application of gene expression data, and sequence analysis. Because of the format of the workshop and the high number of submissions, many good papers could not be included.
Author(s): Simon Mercer (auth.), Mehmet M. Dalkilic, Sun Kim, Jiong Yang (eds.)
Series: Lecture Notes in Computer Science 4316 : Lecture Notes in Bioinformatics
Edition: 1
Publisher: Springer-Verlag Berlin Heidelberg
Year: 2006
Language: English
Pages: 198
Tags: Data Mining and Knowledge Discovery; Artificial Intelligence (incl. Robotics); Information Storage and Retrieval; Computational Biology/Bioinformatics; Probability and Statistics in Computer Science; Health Informatics
Front Matter....Pages -
Bioinformatics at Microsoft Research....Pages 1-1
A Novel Approach for Effective Learning of Cluster Structures with Biological Data Applications....Pages 2-13
Subspace Clustering of Microarray Data Based on Domain Transformation....Pages 14-28
Bayesian Hierarchical Models for Serial Analysis of Gene Expression....Pages 29-39
Applying Gaussian Distribution-Dependent Criteria to Decision Trees for High-Dimensional Microarray Data....Pages 40-49
A Biological Text Retrieval System Based on Background Knowledge and User Feedback....Pages 50-64
Automatic Annotation of Protein Functional Class from Sparse and Imbalanced Data Sets....Pages 65-77
Bioinformatics Data Source Integration Based on Semantic Relationships Across Species....Pages 78-93
An Efficient Storage Model for the SBML Documents Using Object Databases....Pages 94-105
Identification of Phenotype-Defining Gene Signatures Using the Gene-Pair Matrix Based Clustering....Pages 106-119
TP+Close: Mining Frequent Closed Patterns in Gene Expression Datasets....Pages 120-130
Exploring Essential Attributes for Detecting MicroRNA Precursors from Background Sequences....Pages 131-145
A Gene Structure Prediction Program Using Duration HMM....Pages 146-157
An Approximate de Bruijn Graph Approach to Multiple Local Alignment and Motif Discovery in Protein Sequences....Pages 158-169
Discovering Consensus Patterns in Biological Databases....Pages 170-184
Comparison of Modularization Methods in Application to Different Biological Networks....Pages 185-195
Back Matter....Pages -