Advancements in Knowledge Distillation: Towards New Horizons of Intelligent Systems

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

The book provides a timely coverage of the paradigm of knowledge distillation―an efficient way of model compression. Knowledge distillation is positioned in a general setting of transfer learning, which effectively learns a lightweight student model from a large teacher model. The book covers a variety of training schemes, teacher–student architectures, and distillation algorithms. The book covers a wealth of topics including recent developments in vision and language learning, relational architectures, multi-task learning, and representative applications to image processing, computer vision, edge intelligence, and autonomous systems. The book is of relevance to a broad audience including researchers and practitioners active in the area of machine learning and pursuing fundamental and applied research in the area of advanced learning paradigms.

Author(s): Witold Pedrycz, Shyi-Ming Chen
Series: Studies in Computational Intelligence, 1100
Publisher: Springer
Year: 2023

Language: English
Pages: 238
City: Cham

Preface
Contents
Categories of Response-Based, Feature-Based, and Relation-Based Knowledge Distillation
1 Categories of Response-Based, Feature-Based, and Relation-Based Knowledge Distillation
1.1 Response-Based Knowledge Distillation
1.2 Feature-Based Knowledge Distillation
1.3 Relation-Based Knowledge Distillation
2 Distillation Schemes
2.1 Offline Knowledge Distillation
2.2 Online Knowledge Distillation
2.3 Self-knowledge Distillation
2.4 Comprehensive Comparison
3 Distillation Algorithms
3.1 Multi-teacher Distillation
3.2 Cross-Modal Distillation
3.3 Attention-Based Distillation
3.4 Data-Free Distillation
3.5 Adversarial Distillation
4 Conclusion
References
A Geometric Perspective on Feature-Based Distillation
1 Introduction
2 Prior Art on Feature-Based Knowledge Distillation
2.1 Definitions
2.2 Related Work
3 Geometric Considerations on FKD
3.1 Local Manifolds and FKD
3.2 Manifold-Manifold Distance Functions
3.3 Interpretation of Graph Reordering as a Tool Measuring Similarity
4 Formulating Geometric FKD Loss Functions
4.1 Neighboring Pattern Loss
4.2 Affinity Contrast Loss
5 Experimental Verification
5.1 Materials and Methods
5.2 Knowledge Distillation from Large Teacher to Small Student Models
5.3 Comparison with Vanilla Knowledge Distillation
5.4 Knowledge Distillation Between Large Models
5.5 Effects of Neighborhood
6 Case Study: Geometric FKD in Data-Free Knowledge Transfer Between Architectures. An Application in Offline Signature Verification
6.1 Problem Formulation
6.2 Experimental Setup
6.3 Results
7 Discussion
8 Conclusions
References
Knowledge Distillation Across Vision and Language
1 Introduction
2 Vision Language Learning and Contrastive Distillation
2.1 Vision and Language Representation Learning
2.2 Contrastive Learning and Knowledge Distillation
2.3 Contrastive Distillation for Self-supervised Learning
3 Contrastive Distillation for Vision Language Representation Learning
3.1 DistillVLM
3.2 Attention Distribution Distillation
3.3 Hidden Representation Distillation
3.4 Classification Distillation
4 Experiments
4.1 Datasets
4.2 Implementation Details Visual Representation
4.3 VL Pre-training and Distillation
4.4 Transferring to Downstream Tasks
4.5 Experimental Results
4.6 Distillation over Different Losses
4.7 Different Distillation Strategies
4.8 Is VL Distillation Data Efficient?
4.9 Results for Captioning
5 VL Distillation on Unified One-Stage Architecture
5.1 One-Stage VL Architecture
5.2 VL Distillation on One-Stage Architecture
6 Conclusion and Future Works
References
Knowledge Distillation in Granular Fuzzy Models by Solving Fuzzy Relation Equations
1 Introduction
2 Related Works
2.1 Knowledge Granularity in Transfer Learning
2.2 Evolutionary Neural Architecture Search
2.3 Deep Neuro-Fuzzy Networks
3 Problem Statement
3.1 Granular Solutions of the SFRE
3.2 Genetic Search for the Optimal Student Model
3.3 Self-Organizing Fuzzy Relational Neural Networks
4 Hierarchical Teacher-Student Architecture Based on Granular Solutions of the SFRE
4.1 Relation-Based Teacher Model
4.2 Student Model based on Granular Solutions of the SFRE
5 The Problem of Training the Teacher-Student Architecture
6 Knowledge Distillation by Solving the SFRE
6.1 Structure of the Constrained Linguistic Solution
6.2 The Problem of Structural Optimization of the Student Model
6.3 Genetic Algorithm for Parallel Deployment of the Student Model
7 Method for Training the Neuro-Fuzzy Relational System
8 Experimental Results: Time Series Forecasting
8.1 Data Set Organization
8.2 Training Results for the Teacher-Student Architecture
8.3 A Comparison of KD Techniques in Time Series Forecasting
9 Model Compression Estimations
10 Conclusion and Future Work
References
Ensemble Knowledge Distillation for Edge Intelligence in Medical Applications
1 Introduction
2 Background and Related Work
3 Methodology
3.1 Datasets
3.2 Models
3.3 Metrics
3.4 Workflow
4 Experiment
4.1 Standard CIFAR10 and CIFAR100 Datasets
4.2 Specific MedMNIST Medical Datasets
5 Discussion
5.1 Models-in-Family Comparison
5.2 Family-to-Family Comparison
6 Conclusions
References
Self-Distillation with the New Paradigm in Multi-Task Learning
1 Introduction
2 Literature Review
3 Proposed Methodology
3.1 Problem Formulation
3.2 Loss Functions for Task-Specific Modules
3.3 Loss for Self-Distillation
3.4 Total Loss
4 Dataset and Experimental Protocols
4.1 Datasets Description
4.2 Model Architecture
4.3 Training Protocol and Evaluation Metrics
5 Results and Discussion
5.1 Competitors
5.2 Ablation Studies
5.3 Visualization
6 Summary and Future Directions
References
Knowledge Distillation for Autonomous Intelligent Unmanned System
1 Introduction
2 Sense Distillation Problem
3 Related Work
4 Distillation of Sense of Data from Sensors
4.1 Fuzzy Certainty Factor
4.2 External Meaning of the KG
4.3 Knowledge Base Structure
4.4 Internal Meaning of the KG
5 Distillation of Sense of Events Footprint
5.1 Events Sequence Footprint Model
5.2 Events Footprint Convolution
5.3 Examples and Simulation Results
6 AIUS Decision Making and Control Engines
6.1 Fuzzy Logic System as DM&C Engine of AIUS
6.2 Using AIUS History in Fuzzy Logic System Engine
6.3 Model of Multi-purpose Continuous Planning and Goal-Driving Control
7 Conclusion and Future Works
References
Index