This monograph addresses advances in representation learning, a cutting-edge research area of machine learning. Representation learning refers to modern data transformation techniques that convert data of different modalities and complexity, including texts, graphs, and relations, into compact tabular representations, which effectively capture their semantic properties and relations. The monograph focuses on (i) propositionalization approaches, established in relational learning and inductive logic programming, and (ii) embedding approaches, which have gained popularity with recent advances in deep learning. The authors establish a unifying perspective on representation learning techniques developed in these various areas of modern data science, enabling the reader to understand the common underlying principles and to gain insight using selected examples and sample Python code. The monograph should be of interest to a wide audience, ranging from data scientists, machine learning researchers and students to developers, software engineers and industrial researchers interested in hands-on AI solutions.
Author(s): Nada Lavrač; Vid Podpečan; Marko Robnik-Šikonja
Publisher: Springer
Year: 2021
Language: English
Pages: 163
City: Cham
Foreword
Preface
Contents
1 Introduction to Representation Learning
1.1 Motivation
1.2 Representation Learning in Knowledge Discovery
1.2.1 Machine Learning and Knowledge Discovery
1.2.2 Automated Data Transformation
1.3 Data Transformations and Information Representation Levels
1.3.1 Information Representation Levels
1.3.2 Propositionalization: Learning Symbolic Vector Representations
1.3.3 Embeddings: Learning Numeric Vector Representations
1.4 Evaluation of Propositionalization and Embeddings
1.4.1 Performance Evaluation
1.4.2 Interpretability
1.5 Survey of Automated Data Transformation Methods
1.6 Outline of This Monograph
References
2 Machine Learning Background
2.1 Machine Learning
2.1.1 Attributes and Features
2.1.2 Machine Learning Approaches
2.1.3 Decision and Regression Tree Learning
2.1.4 Rule Learning
2.1.5 Kernel Methods
2.1.6 Ensemble Methods
2.1.7 Deep Neural Networks
2.2 Text Mining
2.3 Relational Learning
2.4 Network Analysis
2.4.1 Selected Homogeneous Network Analysis Tasks
2.4.2 Selected Heterogeneous Network Analysis Tasks
2.4.3 Semantic Data Mining
2.4.4 Network Representation Learning
2.5 Evaluation
2.5.1 Classifier Evaluation Measures
2.5.2 Rule Evaluation Measures
2.6 Data Mining and Selected Data Mining Platforms
2.6.1 Data Mining
2.6.2 Selected Data Mining Platforms
2.7 Implementation and Reuse
References
3 Text Embeddings
3.1 Background Technologies
3.1.1 Transfer Learning
3.1.2 Language Models
3.2 Word Cooccurrence-Based Embeddings
3.2.1 Sparse Word Cooccurrence-Based Embeddings
3.2.2 Weighting Schemes
3.2.3 Similarity Measures
3.2.4 Sparse Matrix Representations of Texts
3.2.5 Dense Term-Matrix Based Word Embeddings
3.2.6 Dense Topic-Based Embeddings
3.3 Neural Word Embeddings
3.3.1 Word2vec Embeddings
3.3.2 GloVe Embeddings
3.3.3 Contextual Word Embeddings
3.4 Sentence and Document Embeddings
3.5 Cross-Lingual Embeddings
3.6 Intrinsic Evaluation of Text Embeddings
3.7 Implementation and Reuse
3.7.1 LSA and LDA
3.7.2 word2vec
3.7.3 BERT
References
4 Propositionalization of Relational Data
4.1 Relational Learning
4.2 Relational Data Representation
4.2.1 Illustrative Example
4.2.2 Example Using a Logical Representation
4.2.3 Example Using a Relational Database Representation
4.3 Propositionalization
4.3.1 Relational Features
4.3.2 Automated Construction of Relational Features by RSD
4.3.3 Automated Data Transformation and Learning
4.4 Selected Propositionalization Approaches
4.5 Wordification: Unfolding Relational Data into BoW Vectors
4.5.1 Outline of the Wordification Approach
4.5.2 Wordification Algorithm
4.5.3 Improved Efficiency of Wordification Algorithm
4.6 Deep Relational Machines
4.7 Implementation and Reuse
4.7.1 Wordification
4.7.2 Python-rdm Package
References
5 Graph and Heterogeneous Network Transformations
5.1 Embedding Simple Graphs
5.1.1 DeepWalk Algorithm
5.1.2 Node2vec Algorithm
5.1.3 Other Random Walk-Based Graph Embedding Algorithms
5.2 Embedding Heterogeneous Information Networks
5.2.1 Heterogeneous Information Networks
5.2.2 Examples of Heterogeneous Information Networks
5.2.3 Embedding Feature-Rich Graphs with GCNs
5.2.4 Other Heterogeneous Network Embedding Approaches
5.3 Propositionalizing Heterogeneous Information Networks
5.3.1 TEHmINe Propositionalization of Text-Enriched Networks
5.3.1.1 Heterogeneous Network Decomposition
5.3.1.2 Feature Vector Construction
5.3.1.3 Data Fusion
5.3.2 HINMINE Heterogeneous Networks Decomposition
5.4 Ontology Transformations
5.4.1 Ontologies and Semantic Data Mining
5.4.2 NetSDM Ontology Reduction Methodology
5.4.2.1 Converting Ontology and Examples into Network Format
5.4.2.2 Term Significance Calculation
5.4.2.3 Network Node Removal
5.5 Embedding Knowledge Graphs
5.6 Implementation and Reuse
5.6.1 Node2vec
5.6.2 Metapath2vec
5.6.3 HINMINE
References
6 Unified Representation Learning Approaches
6.1 Entity Embeddings with StarSpace
6.2 Unified Approaches for Relational Data
6.2.1 PropStar: Feature-Based Relational Embeddings
6.2.2 PropDRM: Instance-Based Relational Embeddings
6.2.3 Performance Evaluation of Relational Embeddings
6.3 Implementation and Reuse
6.3.1 StarSpace
6.3.2 PropDRM
References
7 Many Faces of Representation Learning
7.1 Unifying Aspects in Terms of Data Representation
7.2 Unifying Aspects in Terms of Learning
7.3 Unifying Aspects in Terms of Use
7.4 Summary and Conclusions
References
Index