Search for information is no longer exclusively limited within the native language of the user, but is more and more extended to other languages. This gives rise to the problem of cross-language information retrieval (CLIR), whose goal is to find relevant information written in a different language to a query. In addition to the problems of monolingual information retrieval (IR), translation is the key problem in CLIR: one should translate either the query or the documents from a language to another. However, this translation problem is not identical to full-text machine translation (MT): the goal is not to produce a human-readable translation, but a translation suitable for finding relevant documents. Specific translation methods are thus required. The goal of this book is to provide a comprehensive description of the specifi c problems arising in CLIR, the solutions proposed in this area, as well as the remaining problems. The book starts with a general description of the monolingual IR and CLIR problems. Different classes of approaches to translation are then presented: approaches using an MT system, dictionary-based translation and approaches based on parallel and comparable corpora. In addition, the typical retrieval effectiveness using different approaches is compared. It will be shown that translation approaches specifically designed for CLIR can rival and outperform high-quality MT systems. Finally, the book offers a look into the future that draws a strong parallel between query expansion in monolingual IR and query translation in CLIR, suggesting that many approaches developed in monolingual IR can be adapted to CLIR. The book can be used as an introduction to CLIR. Advanced readers can also find more technical details and discussions about the remaining research challenges in the future. It is suitable to new researchers who intend to carry out research on CLIR.
Author(s): Jian-yun Nie
Year: 2010
Language: English
Pages: 142
Tags: Информатика и вычислительная техника;Искусственный интеллект;Компьютерная лингвистика;
Cross-Language Information Retrieval......Page 1
Synthesis Lectures in Human Language Technologies......Page 2
Keywords......Page 6
Dedication......Page 7
Contents......Page 9
Preface......Page 13
Acknowledgement......Page 15
1.1 General IR problems......Page 17
1.2 General IR approaches......Page 18
1.2.1.1 Boolean Models......Page 19
1.2.1.2 Vector Space Model......Page 20
1.2.1.3 Probabilistic Models......Page 21
1.2.1.4 Statistical Language Models......Page 22
1.2.2 Query Expansion......Page 24
1.2.3 System Evaluation......Page 26
1.3.1.2 Decompounding......Page 28
1.3.2.1 Chinese and Word Segmentation......Page 30
1.3.3 Other Languages......Page 33
1.4 The problems of cross-language information retrieval......Page 34
1.4.1 Query Translation vs. Document Translation......Page 35
1.4.2 Using Pivot Language and Interlingua......Page 36
1.5 Approaches to translation in CLIR......Page 37
1.6 The need for cross-language and multilingual IR......Page 39
1.7 The history of CLIR......Page 40
2.1 Machine translation......Page 45
2.1.1 Rule-Based MT......Page 46
2.1.2 Statistical MT......Page 48
2.2 Basic utilization of MT in CLIR......Page 53
2.2.1 Rule-Based MT......Page 55
2.2.3 Unknown Word......Page 57
2.3. Open the box of MT......Page 60
2.4. Dictionary-based Translation for CLIR......Page 61
2.4.1 Basic Approaches......Page 62
2.4.2 The Term Weighting Problem......Page 63
2.4.3 Coverage of the Dictionary......Page 65
2.4.5 Selection of Translation Words......Page 66
2.4.6.1 Phrase-Based and Structured Query Translation......Page 69
2.4.6.2 Using Multilingual Thesauri......Page 70
3.1 Parallel corpora......Page 73
3.2 Paragraph/sentence alignment......Page 76
3.3 Utilization of translation models in CLIR......Page 79
3.4 Embedding translation models into CLIR models......Page 86
3.5.1 Exploiting a Parallel Corpus by Pseudo-Relevance Feedback......Page 91
3.5.2 Using Latent Semantic Indexing (LSI)......Page 92
3.5.3 Using Comparable Corpora......Page 94
3.6 Discussions on CLIR methods and resources......Page 96
3.7.1 Mining for Parallel Texts......Page 97
3.7.2 Transliteration......Page 101
3.7.3 Mining Translations Using Hyperlinks......Page 104
3.7.4 Mining Translations from Monolingual Web Pages......Page 106
4.1 Pre- and post-translation expansion......Page 111
4.2 Fuzzy matching......Page 112
4.3 Combining translations......Page 113
4.4 Transitive translation......Page 114
4.5 Integrating monolingual and translingual relations......Page 116
4.6 Discussions......Page 119
5.1 What has been achieved?......Page 121
5.2.1 Parallel Between Query Expansion and Query Translation......Page 122
5.2.2 Inspiring Query Translation from Query Expansion-An Example......Page 125
References......Page 129
Author Biography......Page 141