This book presents a systematic study of practices and theories for query understanding of search engines. These studies can be categorized into three major classes. The first class is to figure out what the searcher wants by extracting semantic meaning from the searcher’s keywords, such as query classification, query tagging, and query intent understanding. The second class is to analyze search queries and then translate them into an enhanced query that can produce better search results, such as query spelling correction or query rewriting. The third class is to assist users in refining or suggesting queries in order to reduce users’ search effort and satisfy their information needs, such as query auto-completion and query suggestion.
Query understanding is a fundamental part of search engines. It is responsible to precisely infer the intent of the query formulated by the search user, to correct spelling errors in his/her query, to reformulate the query to capture its intent more accurately, and to guide the user in formulating a query with precise intent.
The book will be invaluable to researchers and graduate students in computer or information science and specializing in information retrieval or web-based systems, as well as to researchers and programmers working on the development or improvement of products related to search engines.
Author(s): Yi Chang, Hongbo Deng
Series: The Information Retrieval Series, 46
Publisher: Springer Singapore
Year: 2021
Language: English
Pages: 224
City: Singapore
Foreword
Contents
Editors and Contributors
About the Editors
Contributors
1 An Introduction to Query Understanding
1.1 Introduction
1.2 Query Classification
1.3 Query Segmentation and Tagging
1.4 Query Intent Understanding
1.5 Query Spelling Correction
1.6 Query Rewriting
1.7 Query Auto-Completion
1.8 Query Suggestion
1.9 Discussion and Future Directions
References
2 Query Classification
2.1 Introduction
2.2 Query Intent Classification
2.3 Query Topic Classification
2.3.1 Topic Taxonomy
2.3.2 Methods on Different Taxonomies
2.3.2.1 Representative Work on KDD Cup Taxonomy
2.3.2.2 Representative Work on AOL Taxonomy
2.3.2.3 Representative Work on Other Taxonomies
2.3.3 Summary
2.4 Query Performance Classification
2.4.1 Representative Methods
2.4.2 Effective Features in Performance Prediction
2.4.3 Summary
2.5 Other Query Classification Tasks
2.5.1 Location-Based Classification
2.5.2 Time-Based Classification
2.6 Summary
References
3 Query Segmentation and Tagging
3.1 Introduction
3.2 Query Segmentation
3.2.1 Problem Formulation
3.2.2 Heuristic-Based Approaches
3.2.2.1 Pointwise Mutual Information
3.2.2.2 Connexity
3.2.2.3 Naive Segmentation
3.2.2.4 Summary
3.2.3 Supervised Learning Approaches
3.2.3.1 Summary
3.2.4 Unsupervised Learning Approaches
3.2.4.1 Dynamic Programming for Top Segmentations
3.2.4.2 Parameter Estimation
3.2.4.3 External Sources
3.2.4.4 Summary
3.2.5 Applications
3.3 Query Syntactic Tagging
3.3.1 Syntactic Structures for Search Queries
3.3.2 Supervised Learning Approaches
3.3.3 Transfer Learning Approaches
3.3.3.1 Simple Transfer Methods
3.3.3.2 Learning Methods
3.3.4 Summary
3.4 Query Semantic Tagging
3.4.1 Named Entity Recognition
3.4.1.1 Template-Based Approach
3.4.1.2 Weakly Supervised Learning Approach
3.4.2 Fine-Grained Tagging
3.5 Conclusions
References
4 Query Intent Understanding
4.1 Introduction to Query Intent Understanding
4.2 Intent Classification Based on User Goals
4.2.1 Taxonomies of User Goals
4.2.1.1 Broder's Intent Taxonomy
4.2.1.2 Rose and Levinson's Taxonomy
4.2.1.3 Taxonomy Proposed by Baeza-Yates et al.
4.2.1.4 Taxonomy Proposed by Jansen et al.
4.2.1.5 Summarization
4.2.2 Methods Used for Predicting User Goals
4.2.3 Features
4.2.3.1 Features Extracted from Query Strings
4.2.3.2 Features Extracted from the Corpus
4.2.3.3 Features Based on Query Log
4.2.3.4 Features Leveraging Multiple Sources
4.2.3.5 Summary of Features Used
4.2.4 Summary
4.3 Vertical Intent Classification
4.3.1 Topical Intent Classification
4.3.2 Vertical Intent Classification
4.3.2.1 Corpus-Based Features
4.3.2.2 Query String-Based Features
4.3.2.3 Query Log-Based Features
4.3.2.4 Search Results-Based Features
4.3.2.5 Vertical Intent Classification Models
4.4 Query Intent Mining
4.4.1 Mining Intent from Query Logs
4.4.1.1 Mining Intent from Query Strings and Sessions
4.4.1.2 Mining Intent Based on Reformulation Behavior
4.4.1.3 Mining Intent from Click Graph
4.4.2 Mining Intent from Search Results
4.4.3 Mining Intent from Anchor Texts
4.4.4 Mining Intent from Query Suggestions
4.4.5 Mining Complex Intents
4.5 Other Kinds of Intent Classification
4.5.1 Temporal Intent Classification
4.5.2 Geographic Intent Classification
References
5 Query Spelling Correction
5.1 Introduction
5.1.1 Problem Setup and Challenges
5.2 Early Works on Spelling Correction
5.2.1 Edit Distance with Dynamic Programming
5.2.2 Spelling Correction Search over a Trie
5.3 Noisy Channel Model
5.4 Query Spelling Correction with Multiple Types of Errors
5.4.1 A Generalized HMM for Query Spelling Correction
5.4.2 Generalization of HMM Scoring Function
5.4.3 Discriminative Training
5.4.4 Query Correction Computation
5.5 Structural Learning Approaches for Query Spelling Correction
5.5.1 The Discriminative Form of Query Spelling Correction
5.5.2 Latent Structural SVM
5.5.3 Query Spelling Correct Inference by LS-SVM
5.5.4 Features
5.6 Other Components for Query Spelling Correction
5.7 Summary
References
6 Query Rewriting
6.1 Introduction
6.2 QRW with Shallow Models
6.2.1 Substitution-Based Methods
6.2.2 Translation-Based Methods
6.3 QRW with Deep Models
6.3.1 Word Embedding for QRW
6.3.2 Seq2Seq for QRW
6.3.3 Learning to Rewrite Methods
6.3.4 Deep Reinforcement Learning for QRW
6.4 Conclusion
References
7 Query Auto-Completion
7.1 Problem Definition
7.2 Evaluation Metrics for QAC
7.2.1 Ranking Metrics
7.2.2 User Assist Metrics
7.3 QAC Logs
7.4 QAC Methods
7.4.1 Time-Sensitive QAC
7.4.2 Context-Sensitive QAC
7.4.3 Personalized QAC
7.4.4 User Interactions in QAC
7.4.5 User Interactions Besides QAC
7.5 Historical Notes
7.6 Summary
References
8 Query Suggestion
8.1 Introduction
8.1.1 An Overview of Query Suggestion Approaches
8.1.2 Examples of Query Suggestion Approaches
8.1.3 Evaluation Metrics for Query Suggestion
8.1.4 Notation Used in This Chapter
8.1.5 Structure of This Chapter
8.2 Query Co-occurrence Methods
8.2.1 Similarity Functions
8.2.2 Extracting Tasks from Sessions
8.2.3 Method Analysis and Comparison
8.2.4 Summary
8.3 Query-URL Bipartite Graph Methods
8.3.1 Forward and Backward Random Walks
8.3.2 Hitting Time Approach
8.3.3 Combining Click and Skip Graphs
8.3.4 Method Analysis and Comparison
8.3.5 Summary and Discussion
8.4 Query Transition Graph Methods
8.4.1 Query Flow Graph (QFG)
8.4.2 Term Transition Graph (TTG)
8.4.3 Analysis of Query Transition Methods
8.4.4 Summary
8.5 Short-Term Search Context Methods
8.5.1 Decay Factor Based Approaches
8.5.2 Sequence Mining Approaches
8.5.2.1 Concept Mining Using Clustering Algorithm
8.5.3 Method Analysis and Comparison
8.5.4 Summary
8.6 Other Query Suggestion Related Work
8.7 Discussions and Future Directions
References
9 Future Directions of Query Understanding
9.1 Personalized Query Understanding
9.2 Natural Language Question Understanding
9.3 Dialog Query Conversational Query Understanding
9.4 Medical Query Understanding
9.5 Cross-Language Query Understanding and Translation
9.6 Temporal Dynamics of Queries
9.7 Deep Learning for Query Understanding
9.8 Semantic Understanding and Matching for Search Queries
9.9 Query Understanding with Knowledge Graph
References