Data integration is a critical problem in our increasingly interconnected but inevitably heterogeneous world. There are numerous data sources available in organizational databases and on public information systems like the World Wide Web. Not surprisingly, the sources often use different vocabularies and different data structures, being created, as they are, by different people, at different times, for different purposes. The goal of data integration is to provide programmatic and human users with integrated access to multiple, heterogeneous data sources, giving each user the illusion of a single, homogeneous database designed for his or her specific need. The good news is that, in many cases, the data integration process can be automated. This book is an introduction to the problem of data integration and a rigorous account of one of the leading approaches to solving this problem, viz., the relational logic approach. Relational logic provides a theoretical framework for discussing data integration. Moreover, in many important cases, it provides algorithms for solving the problem in a computationally practical way. In many respects, relational logic does for data integration what relational algebra did for database theory several decades ago. A companion web site provides interactive demonstrations of the algorithms. Table of Contents: Preface / Interactive Edition / Introduction / Basic Concepts / Query Folding / Query Planning / Master Schema Management / Appendix / References / Index / Author Biography
Author(s): Michael Genesereth
Series: Synthesis Lectures on Artificial Intelligence and Machine Learning
Publisher: Morgan and Claypool Publishers
Year: 2010
Language: English
Pages: 110
Tags: Информатика и вычислительная техника;Искусственный интеллект;
Data Integration The Relational Logic Approach......Page 6
Synthesis Lectures on Artificial Intelligence and Machine Learning......Page 4
Keywords......Page 7
Contents......Page 8
Preface......Page 10
Interactive Edition......Page 12
1.1 DATA INTEGRATION......Page 14
1.2 HETEROGENEITY......Page 15
1.3 DIRECT MAPPING......Page 20
1.4 SOURCE-BASED INTEGRATION......Page 25
1.5 MODEL-CENTRIC INTEGRATION......Page 28
1.6 READING GUIDE......Page 29
2.1 RELATIONAL DATABASES......Page 32
2.2 SENTENTIAL DATABASES......Page 34
2.3 DATALOG PROGRAMS......Page 35
2.4 OPEN DATALOG PROGRAMS......Page 37
2.5 DATABASE QUERIES......Page 38
2.6 DATABASE CONSTRAINTS......Page 39
2.8 FUNCTIONAL DATALOG PROGRAMS......Page 40
2.10 ENHANCED DATALOG PROGRAMS......Page 41
3.2 PROBLEM DEFINITION......Page 44
3.3 INVERSE METHOD......Page 46
3.4 CONJUNCTIVE SOURCE DEFINITIONS......Page 47
3.5 DISJUNCTIVE SOURCE DEFINITIONS......Page 50
Bucket Algorithm......Page 56
Unijoin Algorithm......Page 58
3.7 DECIDABILITY......Page 60
4.2 OPTIMIZATION......Page 62
4.3 SOURCING......Page 64
4.4 EXECUTION PLANNING......Page 67
5.1 INTRODUCTION......Page 72
5.2 REIFICATION......Page 73
5.3 AUXILIARY TABLES......Page 74
5.4 CONSTRAINT FOLDING......Page 76
A.2 Sentential Representation......Page 80
A.3 Linked Lists......Page 86
A.4 Unification......Page 87
A.5 Storage......Page 90
A.6 Local Evaluation......Page 94
A.7 Query Folding......Page 99
A.8 Sourcing......Page 101
References......Page 104
Index......Page 108
Author Biography......Page 110