This book explains the ideas behind one of the most well-known methods for knowledge graph embedding of transformations to compute vector representations from a graph, known as RDF2vec. The authors describe its usage in practice, from reusing pre-trained knowledge graph embeddings to training tailored vectors for a knowledge graph at hand. They also demonstrate different extensions of RDF2vec and how they affect not only the downstream performance, but also the expressivity of the resulting vector representation, and analyze the resulting vector spaces and the semantic properties they encode.
Author(s): Heiko Paulheim, Petar Ristoski, Jan Portisch
Series: Synthesis Lectures on Data, Semantics, and Knowledge
Edition: 1
Publisher: Springer
Year: 2023
Language: English
Commentary: Publisher's PDF
Pages: 164
City: Cham, CH
Tags: Artificial Intelligence; Machine Learning; Natural Language Processing; word2vec; Word Embeddings; Knowledge Graphs
Preface
Contents
1 Introduction
1.1 What is a Knowledge Graph?
1.1.1 A Short Bit of History
1.1.2 Definitions
1.1.3 General-Purpose Knowledge Graphs
1.2 Feature Extraction from Knowledge Graphs
1.3 Node Classification in RDF
1.4 Conclusion
2 From Word Embeddings to Knowledge Graph Embeddings
2.1 Word Embeddings with word2vec
2.2 Representing Graphs as Sequences
2.3 Learning Representations from Graph Walks
2.4 Software Libraries
2.5 Node Classification with RDF2vec
2.6 Conclusion
3 Benchmarking Knowledge Graph Embeddings
3.1 Node Classification with Internal Labels—SW4ML
3.2 Machine Learning with External Labels—GEval
3.3 Benchmarking Expressivity of Embeddings—DLCC
3.3.1 DLCC Gold Standard based on DBpedia
3.3.2 DLCC Gold Standard based on Synthetic Data
3.4 Conclusion
4 Tweaking RDF2vec
4.1 Introducing Edge Weights
4.1.1 Graph Internal Weighting Approaches
4.1.2 Graph External Weighting Approaches
4.2 Order-Aware RDF2vec
4.2.1 Motivation and Definition
4.2.2 Evaluation
4.2.3 Order-Aware RDF2vec in Action
4.3 Alternative Walk Strategies
4.3.1 Entity Walks and Property Walks
4.3.2 Further Walk Extraction Strategies
4.4 RDF2vec with Materialized Knowledge Graphs
4.4.1 Idea
4.4.2 Experiments
4.4.3 RDF2vec on Materialized Graphs in Action
4.5 Conclusion
5 RDF2vec at Scale
5.1 Using Pre-trained Embeddings
5.1.1 The KGvec2Go Service
5.1.2 KGvec2Go in Action
5.2 Training Partial RDF2vec Models with RDF2vec Light
5.2.1 Approach
5.2.2 RDF2vec Light in Action
5.3 Conclusion
6 Link Prediction in Knowledge Graphs (and its Relation to RDF2vec)
6.1 A Brief Survey on the Knowledge Graph Embedding Landscape
6.2 Knowledge Graph Embedding for Data Mining
6.2.1 Data Mining is Based on Similarity
6.2.2 How RDF2vec Projects Similar Instances Close to Each Other
6.2.3 Using RDF2vec for Link Prediction
6.2.4 Link Prediction with RDF2vec in Action
6.3 Knowledge Graph Embedding Methods for Link Prediction
6.3.1 Link Prediction is Based on Vector Operations
6.3.2 Usage for Data Mining
6.3.3 Comparing the Two Notions of Similarity
6.3.4 Link Prediction Embeddings for Data Mining in Action
6.4 Experiments
6.4.1 Experiments on Data Mining Tasks
6.4.2 Experiments on Link Prediction Tasks
6.5 Conclusion
7 Example Applications Beyond Node Classification
7.1 Recommender Systems with RDF2vec
7.1.1 An RDF2vec-Based Movie Recommender in Less than 20 Lines of Code
7.1.2 Combining Knowledge Graph Embeddings with Other Information
7.2 Ontology Matching
7.2.1 Ontology Matching by Embedding Input Ontologies
7.2.2 Ontology Matching by Embedding External Knowledge Graphs
7.3 Further Use Cases
7.3.1 Knowledge Graph Refinement
7.3.2 Natural Language Processing
7.3.3 Information Retrieval
7.3.4 Applications in the Biomedical Domain
7.4 Conclusion
8 Future Directions for RDF2vec
8.1 Incorporating Information in Literals
8.2 Exploiting Complex Patterns
8.3 Exploiting Ontologies
8.4 Dynamic and Temporal Knowledge Graphs
8.5 Extension to other Knowledge Graph Representations
8.6 Standards and Protocols
8.7 Embeddings and Explainability
A Datasets and Code Examples
A.1 The Band Genre Node Classification Dataset
A.2 The 1k Movie Dataset