Transfer learning is one of the most important technologies in the era of artificial intelligence and deep learning. It seeks to leverage existing knowledge by transferring it to another, new domain. Over the years, a number of relevant topics have attracted the interest of the research and application community: transfer learning, pre-training and fine-tuning, domain adaptation, domain generalization, and meta-learning.
This book offers a comprehensive tutorial on an overview of transfer learning, introducing new researchers in this area to both classic and more recent algorithms. Most importantly, it takes a “student’s” perspective to introduce all the concepts, theories, algorithms, and applications, allowing readers to quickly and easily enter this area. Accompanying the book, detailed code implementations are provided to better illustrate the core ideas of several important algorithms, presenting good examples for practice.
Author(s): Jindong Wang, Yiqiang Chen
Series: Machine Learning: Foundations, Methodologies, and Applications
Publisher: Springer
Year: 2023
Language: English
Pages: 332
City: Singapore
Preface
Acknowledgments
Contents
Acronyms
Symbols
Part I Foundations
1 Introduction
1.1 Transfer Learning
1.2 Related Research Fields
1.3 Why Transfer Learning?
1.3.1 Big Data vs. Less Annotation
1.3.2 Big Data vs. Poor Computation
1.3.3 Limited Data vs. Generalization Requirements
1.3.4 Pervasive Model vs. Personal Need
1.3.5 For Specific Applications
1.4 Taxonomy of Transfer Learning
1.4.1 Taxonomy by Feature Space
1.4.2 Taxonomy by Target Domain Labels
1.4.3 Taxonomy by Learning Methodology
1.4.4 Taxonomy by Online or Offline Learning
1.5 Transfer Learning in Academia and Industry
1.6 Overview of Transfer Learning Applications
1.6.1 Computer Vision
1.6.2 Natural Language Processing
1.6.3 Speech
1.6.4 Ubiquitous Computing and Human–Computer Interaction
1.6.5 Healthcare
1.6.6 Other Applications
References
2 From Machine Learning to Transfer Learning
2.1 Machine Learning Basics
2.1.1 Machine Learning
2.1.2 Structural Risk Minimization
2.1.3 Probability Distribution
2.2 Definition of Transfer Learning
2.2.1 Domains
2.2.2 Formal Definition
2.3 Fundamental Problems in Transfer Learning
2.3.1 When to Transfer
2.3.2 Where to Transfer
2.3.3 How to Transfer
2.4 Negative Transfer Learning
2.5 A Complete Transfer Learning Process
References
3 Overview of Transfer Learning Algorithms
3.1 Measuring Distribution Divergence
3.2 Unified Representation for Distribution Divergence
3.2.1 Estimation of Balance Factor μ
3.3 A Unified Framework for Transfer Learning
3.3.1 Instance Weighting Methods
3.3.2 Feature Transformation Methods
3.3.3 Model Pre-training
3.3.4 Summary
3.4 Practice
3.4.1 Data Preparation
3.4.2 Baseline Model: K-Nearest Neighbors
References
4 Instance Weighting Methods
4.1 Problem Definition
4.2 Instance Selection Methods
4.2.1 Non-reinforcement Learning-Based Methods
4.2.2 Reinforcement Learning-Based Methods
4.3 Weight Adaptation Methods
4.4 Practice
4.5 Summary
References
5 Statistical Feature Transformation Methods
5.1 Problem Definition
5.2 Maximum Mean Discrepancy-Based Methods
5.2.1 The Basics of MMD
5.2.2 MMD-Based Transfer Learning
5.2.3 Computation and Optimization
5.2.4 Extensions of MMD-Based Transfer Learning
5.3 Metric Learning-Based Methods
5.3.1 Metric Learning
5.3.2 Metric Learning for Transfer Learning
5.4 Practice
5.5 Summary
References
6 Geometrical Feature Transformation Methods
6.1 Subspace Learning Methods
6.1.1 Subspace Alignment
6.1.2 Correlation Alignment
6.2 Manifold Learning Methods
6.2.1 Manifold Learning
6.2.2 Manifold Learning for Transfer Learning
6.3 Optimal Transport Methods
6.3.1 Optimal Transport
6.3.2 Optimal Transport for Transfer Learning
6.4 Practice
6.5 Summary
References
7 Theory, Evaluation, and Model Selection
7.1 Transfer Learning Theory
7.1.1 Theory Based on H-Divergence
7.1.2 Theory Based on H H-Distance
7.1.3 Theory Based on Discrepancy Distance
7.1.4 Theory Based on Labeling Function Difference
7.2 Metric and Evaluation
7.3 Model Selection
7.3.1 Importance Weighted Cross Validation
7.3.2 Transfer Cross Validation
7.4 Summary
References
Part II Modern Transfer Learning
8 Pre-Training and Fine-Tuning
8.1 How Transferable Are Deep Networks
8.2 Pre-Training and Fine-Tuning
8.2.1 Benefits of Pre-Training and Fine-Tuning
8.3 Regularization for Fine-Tuning
8.4 Pre-Trained Models for Feature Extraction
8.5 Learn to Pre-Training and Fine-Tuning
8.6 Practice
8.7 Summary
References
9 Deep Transfer Learning
9.1 Overview
9.2 Network Architectures for Deep Transfer Learning
9.2.1 Single-Stream Architecture
9.2.2 Two-Stream Architecture
9.3 Distribution Adaptation in Deep Transfer Learning
9.4 Structure Adaptation for Deep Transfer Learning
9.4.1 Batch Normalization
9.4.2 Multi-view Structure
9.4.3 Disentanglement
9.5 Knowledge Distillation
9.6 Practice
9.6.1 Network Structure
9.6.2 Loss
9.6.3 Train and Test
9.7 Summary
References
10 Adversarial Transfer Learning
10.1 Generative Adversarial Networks
10.2 Distribution Adaptation for Adversarial Transfer Learning
10.3 Maximum Classifier Discrepancy for Adversarial Transfer Learning
10.4 Data Generation for Adversarial Transfer Learning
10.5 Practice
10.5.1 Domain Discriminator
10.5.2 Measuring Distribution Divergence
10.5.3 Gradient Reversal Layer
10.6 Summary
References
11 Generalization in Transfer Learning
11.1 Domain Generalization
11.2 Data Manipulation
11.2.1 Data Augmentation and Generation
11.2.2 Mixup-Based Domain Generalization
11.3 Domain-Invariant Representation Learning
11.3.1 Domain-Invariant Component Analysis
11.3.2 Deep Domain Generalization
11.3.3 Disentanglement
11.4 Other Learning Paradigms for Domain Generalization
11.4.1 Ensemble Learning
11.4.2 Meta-Learning for Domain Generalization
11.4.3 Other Learning Paradigms
11.5 Domain Generalization Theory
11.5.1 Average Risk Estimation Error Bound
11.5.2 Generalization Risk Bound
11.6 Practice
11.6.1 Dataloader in Domain Generalization
11.6.2 Training and Testing
11.6.3 Examples: ERM and CORAL
11.7 Summary
References
12 Safe and Robust Transfer Learning
12.1 Safe Transfer Learning
12.1.1 Can Transfer Learning Models Be Attacked?
12.1.2 Reducing Defect Inheritance
12.1.3 ReMoS: Relevant Model Slicing
12.2 Federated Transfer Learning
12.2.1 Federated Learning
12.2.2 Personalized Federated Learning for Non-I.I.D. Data
12.2.2.1 Model Adaptation for Personalized Federated Learning
12.2.2.2 Similarity-Guided Personalized Federated Learning
12.3 Data-Free Transfer Learning
12.3.1 Information Maximization Methods
12.3.2 Feature Matching Methods
12.4 Causal Transfer Learning
12.4.1 What is Causal Relation?
12.4.2 Causal Relation for Transfer Learning
12.5 Summary
References
13 Transfer Learning in Complex Environments
13.1 Imbalanced Transfer Learning
13.2 Multi-Source Transfer Learning
13.3 Open Set Transfer Learning
13.4 Time Series Transfer Learning
13.4.1 AdaRNN for Time Series Forecasting
13.4.2 DIVERSIFY for Time Series Classification
13.5 Online Transfer Learning
13.6 Summary
References
14 Low-Resource Learning
14.1 Compressing Transfer Learning Models
14.2 Semi-supervised Learning
14.2.1 Consistency Regularization Methods
14.2.2 Pseudo Labeling and Thresholding Methods
14.3 Meta-learning
14.3.1 Model-Based Meta-learning
14.3.2 Metric-Based Meta-learning
14.3.3 Optimization-Based Meta-learning
14.4 Self-supervised Learning
14.4.1 Constructing Pretext Tasks
14.4.2 Contrastive Self-supervised Learning
14.5 Summary
References
Part III Applications of Transfer Learning
15 Transfer Learning for Computer Vision
15.1 Objection Detection
15.1.1 Task and Dataset
15.1.2 Load Data
15.1.3 Model
15.1.4 Train and Test
15.2 Neural Style Transfer
15.2.1 Load Data
15.2.2 Model
15.2.3 Train
References
16 Transfer Learning for Natural Language Processing
16.1 Emotion Classification
16.2 Model
16.3 Train and Test
16.4 Pre-training and Fine-tuning
References
17 Transfer Learning for Speech Recognition
17.1 Cross-Domain Speech Recognition
17.1.1 MMD and CORAL for ASR
17.1.2 CMatch Algorithm
17.1.3 Experiments and Results
17.2 Cross-Lingual Speech Recognition
17.2.1 Adapter
17.2.2 Cross-Lingual Adaptation with Adapters
17.2.3 Advanced Algorithm: MetaAdapter and SimAdapter
17.2.4 Results and Discussion
References
18 Transfer Learning for Activity Recognition
18.1 Task and Dataset
18.2 Feature Extraction
18.3 Source Selection
18.4 Activity Recognition Using TCA
18.5 Activity Recognition Using Deep Transfer Learning
References
19 Federated Learning for Personalized Healthcare
19.1 Task and Dataset
19.1.1 Dataset
19.1.2 Data Splits
19.1.3 Model Architecture
19.2 FedAvg: Baseline Algorithm
19.2.1 Clients Update
19.2.2 Communication on the Server
19.2.3 Results
19.3 AdaFed: Adaptive Batchnorm for Federated Learning
19.3.1 Similarity Matrix Computation
19.3.2 Communication on the Server
19.3.3 Results
References
20 Concluding Remarks
References
A Useful Distance Metrics
A.1 Euclidean Distance
A.2 Minkowski Distance
A.3 Mahalanobis Distance
A.4 Cosine Similarity
A.5 Mutual Information
A.6 Pearson Correlation
A.7 Jaccard Index
A.8 KL and JS Divergence
A.9 Maximum Mean Discrepancy
A.10 A-distance
A.11 Hilbert–Schmidt Independence Criterion
B Popular Datasets in Transfer Learning
B.1 Digit Recognition Datasets
B.2 Object Recognition and Image Classification Datasets
B.3 Text Classification Datasets
B.4 Activity Recognition Datasets
C Venues Related to Transfer Learning
C.1 Machine Learning and AI
C.2 Computer Vision and Multimedia
C.3 Natural Language Processing and Speech
C.4 Ubiquitous Computing and Human–Computer Interaction
C.5 Data Mining
Reference