Automated Taxonomy Discovery and Exploration

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book provides a principled data-driven framework that progressively constructs, enriches, and applies taxonomies without leveraging massive human annotated data. Traditionally, people construct domain-specific taxonomies by extensive manual curations, which is time-consuming and costly. In today’s information era, people are inundated with the vast amounts of text data. Despite their usefulness, people haven’t yet exploited the full power of taxonomies due to the heavy curation needed for creating and maintaining them. To bridge this gap, the authors discuss automated taxonomy discovery and exploration, with an emphasis on label-efficient machine learning methods and their real-world usages. Taxonomy organizes entities and concepts in a hierarchy way. It is ubiquitous in our daily life, ranging from product taxonomies used by online retailers, topic taxonomies deployed by news outlets and social media, as well as scientific taxonomies deployed by digital libraries across various domains. When properly analyzed, these taxonomies can play a vital role for science, engineering, business intelligence, policy design, e-commerce, and more. Intuitive examples are used throughout enabling readers to grasp concepts more easily.

Author(s): Jiaming Shen, Jiawei Han
Series: Synthesis Lectures on Data Mining and Knowledge Discovery
Publisher: Springer
Year: 2022

Language: English
Pages: 111
City: Cham

Preface
Contents
1 Introduction
1.1 Overview
1.2 Technical Roadmap
1.2.1 Concept Set Expansion
1.2.2 Taxonomy Construction
1.2.3 Taxonomy Enrichment
1.2.4 Taxonomy-Guided Classification
1.3 Organization
2 Concept Set Expansion
2.1 Overview and Motivations
2.2 Related Work
2.3 SetExpan: Weakly-Supervised Concept Set Expansion
2.3.1 Data Model and Context Features
2.3.2 Context-Dependent Concept Similarity
2.3.3 Context Feature Selection
2.3.4 Concept Selection via Rank Ensemble
2.4 Experiments
2.4.1 Datasets
2.4.2 Compared Methods
2.4.3 Evaluation Metrics
2.4.4 Overall Performance
2.4.5 Ablation Studies
2.4.6 Case Studies
2.5 Extensions of SetExpan
2.5.1 Addressing Concept Drifts via Auxiliary Sets Generation and Co-expansion
2.5.2 Probing Knowledge from Pre-trained Language Models
2.6 Summary
3 Taxonomy Construction
3.1 Overview and Motivations
3.2 Related Work
3.3 HiExpan: Task-Guided Concept Taxonomy Construction
3.3.1 Problem Formulation
3.3.2 Framework Overview
3.3.3 Key Term Extraction
3.3.4 Iterative Width and Depth Expansion
3.3.5 Taxonomy Global Optimization
3.4 Experiments
3.4.1 Datasets
3.4.2 Compared Methods
3.4.3 Evaluation Metrics
3.4.4 Quantitative Results
3.4.5 Case Studies
3.5 Summary
4 Taxonomy Enrichment
4.1 Overview and Motivations
4.2 Related Work
4.3 TaxoExpan: Self-supervised Taxonomy Expansion
4.3.1 Problem Formulation
4.3.2 Taxonomy Modeling and Expansion Goal
4.3.3 Query-Anchor Matching Model
4.3.4 Model Learning and Inference
4.4 Experiments
4.4.1 Experiments on MAG Dataset
4.4.2 Experiments on SemEval Dataset
4.5 Extensions of TaxoExpan
4.5.1 Incorporating More Fine-Grained Self-supervision Tasks
4.5.2 Identifying Potential Children Concepts
4.5.3 Modeling Relations Among News Concepts
4.6 Summary
5 Taxonomy-Guided Classification
5.1 Overview and Motivations
5.2 Related Work
5.3 TaxoClass: Weakly-Supervised Hierarchical Multi-label Text Classification
5.3.1 Problem Formulation
5.3.2 Document-Class Similarity Calculation
5.3.3 Document Core Class Mining
5.3.4 Core Class Guided Classifier Training
5.3.5 Multi-label Self-training
5.4 Experiments
5.4.1 Datasets
5.4.2 Compared Methods
5.4.3 Evaluation Metrics
5.4.4 Implementation Details
5.4.5 Overall Performance Comparison
5.4.6 Effectiveness of Core Class Mining
5.4.7 Analysis of Classifier Architecture
5.4.8 Supervision Signals in Class Names
5.5 Summary
6 Conclusions
6.1 Summary
6.2 Future Work
6.2.1 Integrate Heterogeneous Modalities and Sources
6.2.2 Engage with Human Behaviors and Interactions
6.2.3 Preserve Data Privacy and Model Security