Documents usually have a content and a structure. The content refers to the text of the document, whereas the structure refers to how a document is logically organized. An increasingly common way to encode the structure is through the use of a mark-up language. Nowadays, the most widely used mark-up language for representing structure is the eXtensible Mark-up Language (XML). XML can be used to provide a focused access to documents, i.e. returning XML elements, such as sections and paragraphs, instead of whole documents in response to a query. Such focused strategies are of particular benefit for information repositories containing long documents, or documents covering a wide variety of topics, where users are directed to the most relevant content within a document. The increased adoption of XML to represent a document structure requires the development of tools to effectively access documents marked-up in XML. This book provides a detailed description of query languages, indexing strategies, ranking algorithms, presentation scenarios developed to access XML documents. Major advances in XML retrival were seen from 2002 as a result of INEX, the Initiative for Evaluation of XML Retrieval. INEX, also described in this book, provided test sets for evaluating XML retrieval effectiveness. Many of the developments and results described in this book were investigated within INEX. Table of Contents: Introduction / Basic XML Concepts / Historical Perspectives / Query Languages / Indexing Strategies / Ranking Strategies / Presentation strategies / Evaluating XML Retrieval Effectiveness / Conclusions
Author(s): Mounia Lalmas
Series: Synthesis Lectures on Information Concepts, Retrieval, and Services
Publisher: Morgan and Claypool Publishers
Year: 2009
Language: English
Pages: 112
Acknowledgments......Page 12
Introduction......Page 14
Element......Page 16
Well-Formed XML Document......Page 17
Document Type Declaration......Page 19
XML Schema......Page 20
XML Documents as Trees......Page 21
Structured Document Retrieval......Page 24
Structured Text Retrieval......Page 25
Data- vs Document-Centric XML Documents......Page 26
Content-Oriented XML Retrieval......Page 28
Focused Retrieval......Page 29
Structural Constraints......Page 30
Content-and-Structure......Page 32
XPath......Page 34
NEXI......Page 36
XQuery......Page 37
XQuery Full-Text......Page 38
Discussion......Page 40
Indexing strategies......Page 42
Element-Based Indexing......Page 43
Leaf-Only Indexing......Page 44
Selective Indexing......Page 45
Distributed Indexing......Page 46
Structure Indexing......Page 47
Discussion......Page 48
Element Scoring......Page 50
Contextualization......Page 51
Propagation......Page 52
Aggregation......Page 54
Merging......Page 55
Processing Structural Constraints......Page 56
Discussion......Page 58
Presentation strategies......Page 60
Dealing with Overlaps......Page 61
Presenting Elements in Context......Page 63
Entry Points......Page 64
Discussion......Page 66
Document Collections......Page 68
Topics......Page 69
Relevance Assessments......Page 72
Retrieval Tasks......Page 77
Measures......Page 81
Discussion......Page 84
XML ``element'' Retrieval......Page 86
Beyond XML ``element'' Retrieval......Page 89
Beyond XML Retrieval......Page 91
Bibliography......Page 94
Biography......Page 112