Database Systems for Advanced Applications: 15th International Conference, DASFAA 2010, Tsukuba, Japan, April 1-4, 2010, Proceedings, Part I (Lecture Notes in Computer Science, 5981)

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This two volume set LNCS 5981 and LNCS 5982 constitutes the refereed proceedings of the 15th International Conference on Database Systems for Advanced Applications, DASFAA 2010, held in Tsukuba, Japan, in April 2010. The 39 revised full papers and 16 revised short papers presented together with 3 invited keynote papers, 22 demonstration papers, 6 industrial papers, and 2 keynote talks were carefully reviewed and selected from 285 submissions. The papers of the first volume are organized in topical sections on P2P-based technologies, data mining technologies, XML search and matching, graphs, spatialdatabases, XML technologies, time series and streams, advanced data mining, query processing, Web, sensor networks and communications, information management, as well as communities and Web graphs. The second volume contains contributions related to trajectories and moving objects, skyline queries, privacy and security, data streams, similarity search and event processing, storage and advanced topics, industrial, demo papers, and tutorials and panels.

Author(s): Hiroyuki Kitagawa (editor), Yoshiharu Ishikawa (editor), Wenjie Li (editor), Chiemi Watanabe (editor)
Publisher: Springer
Year: 2010

Language: English
Pages: 674

Title Page
Preface
Organization
Table of Contents – Part I
Keynote Talks
Knowledge on the Web: Robust and Scalable Harvesting of Entity-Relationship Facts
Cloud Data Management @ Yahoo!
P2P-Based Technologies
Distributed Cache Indexing for Efficient Subspace Skyline Computation in P2P Networks
Introduction
Preliminaries
RelatedWork
Notations and Definitions
Distributed Caching Mechanism
Indexing Cached Results on P2P Overlay
Optimizing Cache Utilization
An Experimental Study
Cache-Hit Probability
Number of Involved Nodes
Message Communication Cost
Response Time
Effect of Updates
Conclusion
References
iDISQUE: Tuning High-Dimensional Similarity Queries in DHT Networks
Introduction
Related Work
Preliminaries
Locality Sensitive Hashing
The iDistance Indexing Scheme
Overview
The iDISQUE Framework
A Naive Indexing Scheme
Locality-Preserving Index Scheme
Locality-PreservingMapping
Index Construction
Query Processing in iDISQUE
Basic Query Processing Scheme
Multi-probing in iDISQUE
Load-Balancing
Experimental Results
Experiment Setup
Comparative Study - Experiment 1
Performance Tuning - Experiment 2
Load-Balancing - Experiment 3
Scalability - Experiment 4
Conclusion
References
Adaptive Ensemble Classification in P2P Networks
Introduction
Related Work
Approach
Training and Clustering Phase
Prediction Phase
Complexity Analysis
Experiments and Analysis
Experiment Setup
Accuracy
Disjoint Data Distribution
Skewed Class Distribution
Parameter Sensitivity
Conclusions
References
Data Mining Technologies
Mining Rare Association Rules in the Datasets with Widely Varying Items’ Frequencies
Introduction
Related Work
Proposed Approach
Basic Idea
Algorithm
Relation between the Frequent Patterns Generated in DifferentModels
Experimental Results
Experiment 1
Experiment 2
Conclusions and Future Work
References
CAMLS: A Constraint-Based Apriori Algorithm for Mining Long Sequences
Introduction
Characterization of Domain Class
Definitions
Constraints
The Algorithm
Event-Wise Phase
Sequence-Wise Phase
Example
Experimental Results
Discussion
References
PGP-mc: Towards a Multicore Parallel Approach for Mining Gradual Patterns
Introduction
Gradual Patterns
Related Works
PGP-mc: Parallel Gradual Pattern Extraction
Experimental Results and Discussion
Scalability
Conclusion and Perspectives
References
Generalised Rule Mining
Introduction
Related Work
Generalised Rule Mining (GRM) Framework
Additional Motivational Examples
Probabilistic Association Rule Mining (PARM)
Various Correlated Rule Mining Methods
Generalised Rule Mining (GRM) Algorithm
Experiments
References
XML Search and Matching
An Effective Object-Level XML Keyword Search
Introduction
Motivation
Our Approach
Related Work
DataModel
Object Level Matching Semantics
ISO Matching Semantics
IRO Matching Semantics
Separation of ISO and IRO Results Display
Relevance Oriented Result Ranking
Ranking for ISO
Ranking for IRO
Index Construction
Algorithms
Experimental Evaluation
Effectiveness of ISO and IRO Matching Semantics
Efficiency and Scalability Test
Effectiveness of the Ranking Schemes
Conclusion and Future Work
References
Effectively Inferring the Search-for Node Type in XML Keyword Search
Introduction
Background
Notations
Overview of XReal
Analysis of XReal’s Weaknesses
Inferring Search-for Node
Preliminary Definitions
Dynamic Reduction Factor
Algorithm
Experiments
Experimental Setup
Results of Inferring Search-for Node
Other Related Work
Conclusion
References
Matching Top-k Answers of Twig Patterns inProbabilistic XML
Introduction
Background and Problem Definition
Probabilistic XML Model
Problem Statement
Encoding Scheme for Probabilistic XML
Required Properties of Encoding for PXML
PEDewey: Encoding PXML
ProTJFast Algorithm Based on Document Order
Data Structures and Notations
ProTJFast
PTopKTwig Algorithms Based on Probability Order
Data Structure and Notations
Algorithm PTopKTwig
Enhanced Lower Bounds
Experiments
Experimental Setup
Performance Study
Conclusions
References
Graphs
NOVA: A Novel and Efficient Framework for Finding Subgraph Isomorphism Mappings in Large Graphs
Introduction
Definitions
Framework
Index
Index
Compression
Query
Cost Model
Eager Verification
Correctness and Completeness
Experiment
Real Data
Synthetic Dataset
Related Work
Conclusion
References
Efficiently Answering Probability Threshold-Based SP Queries over Uncertain Graphs
Introduction
Related Work
Problem Definition
Probability Calculation of Shortest Path
Basic Calculation Algorithm
Probabilistic Pruning Rules
Optimization of Basic Calculation Algorithm
Isomorphic Graphs Reduction
Performance Evaluation
Conclusions
References
Discovering Burst Areas in Fast Evolving Graphs
Introduction
Problem Statement
Discovering Burst Areas
Haar Wavelet Decomposition
Bounding Burst Scores of r-Hop Neighborhood Subgraphs
Incremental Computation of Multiple Hop Sizes
Top-k Burst Area Discovery
Experimental Evaluation
Data Sets
Effectiveness
Efficiency
Related Work
Conclusions
References
Spatial Databases
Answering Top-k Similar Region Queries*
Introduction
Related Work
Preliminaries
Spatial Vector Space Model
Proposed Approach
Quadtree Structure
Region Search Algorithm
Experiment Studies
Settings
Effectiveness Study
Efficiency Study
Conclusion
References
Efficient Approximate Visibility Query in Large Dynamic Environments
Introduction
Related Work
Memory-Based Approaches
Disk-Based Approaches
Problem Definition
Solution Overview
Index Construction
Index Maintenance
Viewpoint Filtering
Object Filtering
Performance Evaluation
Experimental Methodology
Query Accuracy
Query Response Time
Update Cost
Conclusion and Future Work
References
The Objects Interaction Matrix for Modeling Cardinal Directions in Spatial Databases*
Introduction
Related Work
The Objects Interaction Matrix Model
The Tiling Phase: Representing Interactions of Objects with the Objects Interaction Grid and Matrix
The Interpretation Phase: Assigning Semantics to the Objects Interaction Matrix
Comparison to Past Approaches
Defining Directional Predicates within Databases
Conclusions and Future Work
References
Efficient Algorithms to Monitor Continuous Constrained k Nearest Neighbor Queries
Introduction
Background Information
Preliminaries
Related Work
Motivation
Grid-Tree Based Algorithm
The Conceptual Grid-Tree
Initial Computation
Continuous Monitoring
Remarks
ArcTrip Based Algorithm
ArcTrip
Initial Computation
Continuous Monitoring
Remarks
Experiments
Conclusion
References
XML Technologies
Chasing Tree Patterns under Recursive DTDs
Introduction
Preliminaries
DTD, XTree and Tree Patterns
TP Containment and Boolean Patterns
New Constraints Derivable from a Recursive DTD
Finding the New Constraints Implied by DTD
Chasing TPs with Constraints
Chase1
Chase2
Conclusion
References
Efficient Label Encoding for Range-Based Dynamic XML Labeling Schemes
Preliminary
Range-Based Labeling Schemes
Dynamic Formats
ST Encoding Technique
ST-Binary (STB)
ST-Quaternary (STQ)
ST-Vector (STV)
Comparison with Insertion-Based Approach
Encoding Table Compression
Tree Partitioning (TP)
Experiments and Results
Encoding Time
Memory Usage and Encoding Table Compression
Label Size and Query Performance
Conclusion
References
An Efficient Parallel PathStack Algorithm for Processing XML Twig Queries on Multi-core Systems
Introduction
Related Work
Preliminaries
P-PathStack: Parallel PathStack Algorithm
Even Partition
Work Balance
Elements Skip
P-PathStack: Parallel PathStack Algorithm
Experimental Study
Performance of P-PathStack
Even Partition
Conclusion and Future Work
References
Keyword Search on Hybrid XML-Relational Databases Using XRjoin
Introduction
Keyword Search on a Hybrid XML-RDB System
Data Model
Our Approach of Keyword Search
XRjoin
Definition of XRjoin
Behavior of XRjoin
Concluding Remarks
References
Efficient Database-Driven Evaluation of Security Clearance for Federated Access Control of Dynamic XML Documents
Introduction
The DIFF Approach
Relational Schema
Effects of the Changes
The DIFF Algorithm
Experimental Results
Conclusions and Future Work
References
Time Series and Streams
Speeding Up Complex Video Copy Detection Queries
Introduction
Video Copy Detection
Query Processing
Generic Filter
Dual MinDist
Completeness
Speedup Using Approximations
Experiments
Conclusion
References
Efficient Skyline Maintenance for Streaming Data with Partially-Ordered Domains*
Introduction
Overviewof STARS Approach
STARS+ Approach
Dominating Tuple (DT) Optimization
Empty Cell (EC) Optimization
Geometric Arrangement (Minmax) Optimization
SkyGrid Approach
Experimental Evaluation
Experiment Settings
Evaluating STARS+ Optimizations
Evaluating Overall Performance
Conclusion
References
A Simple, Yet Effective and Efficient, Sliding Window Sampling Algorithm*
Introduction
Background and Related Work
Reservoir Algorithm
Sampling with Updates
Sampling Sliding Window
Biased Reservoir Sampling
FIFO Sampling Algorithm
Probability Analysis
Optimal Selection Probability
Performance Evaluation
Comparison of Analytical Bias Functions
Empirical Performance Evaluation: Setup
Empirical Performance Evaluation: Synthetic Data
Empirical Performance Evaluation: Real Data
Conclusions
References
Detecting Leaders from Correlated Time Series
Introduction
Preliminaries
Leadership Discovery
Lagged Correlation Computation
Graph Construction
Leader Extraction
The Overall Algorithm
Real-Time Correlation Update
Experimental Results
Related Work
Conclusions
References
Advanced Data Mining
Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces
Introduction
Literature Review
Methodology
Ensemble Construction
HeDES Framework
Outlier Score Function
COMBINE Functions
Experimental Evaluation
Conclusions and Future Work
References
Mining Diversity on Networks
Introduction
Related Work
Diversity Definition
Terminology and Representation
A Simple Diversity Example
Diversity: General Definition
Examples and Analysis
Top-K Diversity Ranking Algorithm
Experimental Results
Results on Synthetic Network
Results on DBLP Network
Results on Network of American Football Games
Performance Comparison
Discussion
Conclusion
References
Mining Regular Patterns in Data Streams
Introduction
Related Work
Problem Definition
RPS-Tree: Design, Construction, and Mining
Design of an RPS-Tree
Construction of an RPS-Tree
Updating the RPS-Tree
Mining the RPS-Tree
Experimental Analyses
Memory Efficiency
Runtime Efficiency
Window Size
Discussions and Conclusions
References
Query Processing
k-ARQ: k-Anonymous Ranking Queries*
Introduction
Preliminaries
Problem Definition
Baseline Approach I: Perfect Recall
Baseline Approach II: Mondrian
Hardness analysis
ProposedSolution
Greedy Algorithm by Deletion
Experimental Results
Related Work
References
QSQL: Incorporating Logic-Based Retrieval Conditions into SQL
Introduction
The Quantum Query Language CQQL
Related Work
Integration of CQQL Concepts into SQL
Selection
Projection
Union, Intersection and Difference
Join and Grouping
Implementation and Experiments
Summary and Outlook
References
k-Selection Query over Uncertain Data
Introduction
Preliminaries
RelatedWork
Problem Formulation
Analysisfork-Selection
Expected Best Score (EBS)
Query Recursion
Bounding Property
Query Processing Algorithms
Dynamic Programming (DP) Algorithm
Bounding and Pruning Heuristics
Bounding and Pruning (BP) Algorithm
Performance Evaluation
Conclusion
References
Web
Analysis of Implicit Relations on Wikipedia: Measuring Strength through Mining Elucidatory Objects
Introduction
Related Work
Distance, Connectivity, Co-citation
Cohesion
Explanation of a Relation
Our Method Using Generalized Flow
GeneralizedMaximum Flow
Gain Function forWikipedia
Summary of Our Method
Experiments and Evaluation
Dataset and Environment
Evaluation of Rankings
Estimation of Gain Function
Case Studies of Elucidatory Objects
Conclusion
References
Summarizing and Extracting Online Public Opinion from Blog Search Results
Introduction
Blog Search Results Sentiment Representation
Characteristics of Blog Search Results
BSRs Sentiment Representation
Aggregate Opinions in BSRs Based on Spectral Clustering
Sentiment Similarity Computing
Spectral Clustering for BSRs
Opinion Ranking and Keywords Extraction
Experiments
Experiment Setup
Experiment Results
Related Work
Search Results Clustering
Opinion Mining
Conclusion and Future Work
References
Cloud as Virtual Databases: Bridging Private Databases and Web Services
Introduction
Related Work
Web Services and Functions as Virtual Tables
Virtual Tables Using Table-Valued Functions
Web Services as Virtual Tables
Functions as Virtual Tables
Applications
Joining Local Databases and Web Services
Web Trigger
Conclusions
References
Temporal Top-k Search in Social Tagging Sites Using Multiple Social Networks
Introduction
DataModel
Scoring Functions
Multiple Social Network Components
User Classification
Temporal Scoring Functions
Temporal Ranking Algorithm
Experimental Evaluation
Conclusions
References
Sensor Networks and Communications
Air-Indexing on Error Prone Communication Channels
Introduction
Related Work
RepAir Approach
Scheduling in RepAir
Query Processing in RepAir
Discussion
Experiments
Setup
Fault Tolerance
Scalability
Variation of Query Size and Scenarios
Parameter Discussion for RepAir
Updates and Further Applications
Conclusion
References
Content-Based Multipath Routing for Sensor Networks
Introduction
Background
Query Model
In-Network Aggregation Query Processing
Related Work
Content-Based Multipath Routing
Minimum Mergeable Distance
Distributed MD Computation
The Aggregate Forwarding Algorithm
Performance Evaluation
Conclusions
References
Evaluating Continuous Probabilistic Queries OverImprecise Sensor Data
Introduction
Related Work
Continuous Probabilistic Queries
System Model
Evaluation of CPQ
The Probabilistic Filter Protocol
Protocol Design
Tolerant Probabilistic Filters
Probabilistic Tolerance
Protocol Design
Experimental Evaluation
Experimental Setup
Experimental Result
Conclusions
References
Information Management
BPMN Process Views Construction
Introduction
Formal Model of BPMN Processes
Process View Construction
Preliminaries
Process View Consistency Rules
Constructing an Aggregate
Prototype
Related Work and Discussion
Conclusion and Future Work
References
Active Duplicate Detection
Introduction
Related Work
Duplicate Principle
Active Duplicate Detection
Duplicate Principle Tree
Initial Solution and Refinement
Experiments
Experiment Setup and Evaluation Metrics
Precision vs. Recall
Conclusion
References
FlexTable: Using a Dynamic Relation Model to Store RDF Data
Introduction
Preliminary
Schema Evolution
Similarity Measurement
Lattice-Based Algorithm(LBA)
Control Parameter
Modification of Physical Storage
Experiment and Analysis
Analysis of Triples Import
Analysis of Storage Cost
Analysis of Query Performance
State of the Art
TripleStore
VertPart
Conclusion
References
Communities and Web Graphs
An MDL Approach to Efficiently Discover Communities in Bipartite Network*
Introduction
Related Works
An MDL Criterion
Terminologies
A Two-Part Coding
Problem Formulation
A Split-Refine Greedy Algorithm
Sketch of the Algorithm
Split the Submatrix
Reassign the Rows and Columns to Community
Computational Complexity
Performance Study
The Synthetic Datasets
Evaluation and Results
Scalability
The Southern Women Data Set
Conclusions
References
Fires on the Web: Towards Efficient Exploring Historical Web Graphs
Introduction
Preliminary
Forest Fire Model
Discovery of the Historical Snapshot Graph Problem
The Bottom-Up Approach
Implementation Details
The Leap Search Approach
Implementation Details
Performance Evaluation
Efficiency of Our Solutions
Pruning Efficiency of Our Solutions
Effectiveness of Our Solutions
Conclusion
References
Identifying Community Structures in Networks with Seed Expansion
Introduction
Preliminaries
Network Model and Community
Definitions
Our Algorithm
The Change Value on Modularity Function Q
Expansion Step and Transmissive Probability
The Bound of Expansion Process
Experimental Evaluation
Selecting Seeds
The Experimental Analysis of Five Datasets
Related Work
Conclusion
References
Dynamic Agglomerative-Divisive Clustering of Clickthrough Data for Collaborative Web Search
Introduction
Related Work
BB’s Graph-Based Clustering
CubeSVD
M-LSA
Divisive-Agglomerative Clustering
Community Clickthrough Model
Dynamic Agglomerative-Divisive Clustering
Agglomerative Phase
Divisive Phase
Experimental Results
Experimental Setup
Performance Comparison
Conclusions
References
Author Index