Machine Learning on Commodity Tiny Devices: Theory and Practice

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book aims at the tiny machine learning (TinyML) software and hardware synergy for edge intelligence applications. It presents on-device learning techniques covering model-level neural network design, algorithm-level training optimization, and hardware-level instruction acceleration. Analyzing the limitations of conventional in-cloud computing would reveal that on-device learning is a promising research direction to meet the requirements of edge intelligence applications. As to the cutting-edge research of TinyML, implementing a high-efficiency learning framework and enabling system-level acceleration is one of the most fundamental issues. This book presents a comprehensive discussion of the latest research progress and provides system-level insights on designing TinyML frameworks, including neural network design, training algorithm optimization and domain-specific hardware acceleration. It identifies the main challenges when deploying TinyML tasks in the real world and guides the researchers to deploy a reliable learning system. This volume will be of interest to students and scholars in the field of edge intelligence, especially to those with sufficient professional Edge AI skills. It will also be an excellent guide for researchers to implement high-performance TinyML systems.

Author(s): Song Guo, Qihua Zhou
Publisher: CRC Press
Year: 2022

Language: English
Pages: 267
City: Boca Raton

Cover
Half Title
Title Page
Copyright Page
Contents
List of Figures
List of Tables
CHAPTER 1: Introduction
1.1. WHAT IS MACHINE LEARNING ON DEVICES?
1.2. ON-DEVICE LEARNING AND TINYML SYSTEMS
1.2.1. Property of On-Device Learning
1.2.2. Objectives of TinyML Systems
1.3. CHALLENGES FOR REALISTIC IMPLEMENTATION
1.4. PROBLEM STATEMENT OF BUILDING TINYML SYSTEMS
1.5. DEPLOYMENT PROSPECTS AND DOWNSTREAM APPLICATIONS
1.5.1. Evaluation Metrics for Practical Methods
1.5.2. Intelligent Medical Diagnosis
1.5.3. AI-Enhanced Motion Tracking
1.5.4. Domain-Specific Acceleration Chips
1.6. THE SCOPE AND ORGANIZATION OF THIS BOOK
CHAPTER 2: Fundamentals: On-Device Learning Paradigm
2.1. MOTIVATION
2.1.1. Drawbacks of In-Cloud Learning
2.1.2. Rise of On-Device Learning
2.1.3. Bit Precision and Data Quantization
2.1.4. Potential Gains
2.1.5. Why Not Existing Quantization Methods?
2.2. BASIC TRAINING ALGORITHMS
2.2.1. Stochastic Gradient Descent
2.2.2. Mini-Batch Stochastic Gradient Descent
2.2.3. Training of Neural Networks
2.3. PARAMETER SYNCHRONIZATION FOR DISTRIBUTED TRAINING
2.3.1. Parameter Server Paradigm
2.3.2. Parameter Synchronization Pace
2.3.3. Heterogeneity-Aware Distributed Training
2.4. MULTI-CLIENT ON-DEVICE LEARNING
2.4.1. Preliminary Experiments
2.4.2. Observations
2.4.2.1. Training Convergence Efficiency
2.4.2.2. Synchronization Frequency
2.4.2.3. Communication Traffic
2.4.3. Summary
2.5. DEVELOPING KITS AND EVALUATION PLATFORMS
2.5.1. Devices
2.5.2. Benchmarks
2.5.3. Pipeline
2.6. CHAPTER SUMMARY
CHAPTER 3: Preliminary: Theories and Algorithms
3.1. ELEMENTS OF NEURAL NETWORKS
3.1.1. Fully Connected Network
3.1.2. Convolutional Neural Network
3.1.3. Attention-Based Neural Network
3.2. MODEL-ORIENTED OPTIMIZATION ALGORITHMS
3.2.1. Tiny Transformer
3.2.2. Quantization Strategy for Transformer
3.3. PRACTICE ON SIMPLE CONVOLUTIONAL NEURAL NETWORKS
3.3.1. PyTorch Installation
3.3.1.1. On macOS
3.3.1.2. On Windows
3.3.2. CIFAR-10 Dataset
3.3.3. Construction of CNN Model
3.3.3.1. Convolutional Layers
3.3.3.2. Activation Layers
3.3.3.3. Pooling Layers
3.3.3.4. Fully Connected Layers
3.3.3.5. Structure of LeNet-5
3.3.4. Model Training
3.3.5. Model Testing
3.3.6. GPU Acceleration
3.3.6.1. CUDA Installation
3.3.6.2. Programming for GPU
3.3.7. Load Pre-Trained CNNs
CHAPTER 4: Model-Level Design: Computation Acceleration and Communication Saving
4.1. OPTIMIZATION OF NETWORK ARCHITECTURE
4.1.1. Network-Aware Parameter Pruning
4.1.1.1. Pruning Steps
4.1.1.2. Pruning Strategy
4.1.1.3. Pruning Metrics
4.1.1.4. Summary
4.1.2. Knowledge Distillation
4.1.2.1. Combination of Loss Functions
4.1.2.2. Tuning of Hyper-Parameters
4.1.2.3. Usage of Model Training
4.1.2.4. Summary
4.1.3. Model Fine-Tuning
4.1.3.1. Transfer Learning
4.1.3.2. Layer-Wise Freezing and Updating
4.1.3.3. Model-Wise Feature Sharing
4.1.3.4. Summary
4.1.4. Neural Architecture Search
4.1.4.1. Search Space of HW-NAS
4.1.4.2. Targeted Hardware Platforms
4.1.4.3. Trend of Current HW-NAS Methods
4.2. OPTIMIZATION OF TRAINING ALGORITHM
4.2.1. Low Rank Factorization
4.2.2. Data-Adaptive Regularization
4.2.2.1. Core Formulation
4.2.2.2. On-Device Network Sparsification
4.2.2.3. Block-Wise Regularization
4.2.2.4. Summary
4.2.3. Data Representation and Numerical Quantization
4.2.3.1. Elements of Quantization
4.2.3.2. Post-Training Quantization
4.2.3.3. Quantization-Aware Training
4.2.3.4. Summary
4.3. CHAPTER SUMMARY
CHAPTER 5: Hardware-Level Design: Neural Engines and Tensor Accelerators
5.1. ON-CHIP RESOURCE SCHEDULING
5.1.1. Embedded Memory Controlling
5.1.2. Underlying Computational Primitives
5.1.3. Low-Level Arithmetical Instructions
5.1.4. MIMO-Based Communication
5.2. DOMAIN-SPECIFIC HARDWARE ACCELERATION
5.2.1. Multiple Processing Primitives Scheduling
5.2.2. I/O Connection Optimization
5.2.3. Cache Management
5.2.4. Topology Construction
5.3. CROSS-DEVICE ENERGY EFFICIENCY
5.3.1. Multi-Client Collaboration
5.3.2. Efficiency Analysis
5.3.3. Problem Formulation for Energy Saving
5.3.4. Algorithm Design and Pipeline Overview
5.4. DISTRIBUTED ON-DEVICE LEARNING
5.4.1. Community-Aware Synchronous Parallel
5.4.2. Infrastructure Design
5.4.3. Community Manager
5.4.4. Weight Learner
5.4.4.1. Distance Metric Learning
5.4.4.2. Asynchronous Advantage Actor-Critic
5.4.4.3. Agent Learning Methodology
5.4.5. Distributed Training Controller
5.4.5.1. Intra-Community Synchronization
5.4.5.2. Inter-Community Synchronization
5.4.5.3. Communication Traffic Aggregation
5.5. CHAPTER SUMMARY
CHAPTER 6: Infrastructure-Level Design: Serverless and Decentralized Machine Learning
6.1. SERVERLESS COMPUTING
6.1.1. Definition of Serverless Computing
6.1.2. Architecture of Serverless Computing
6.1.2.1. Virtualization Layer
6.1.2.2. Encapsulation Layer
6.1.2.3. System Orchestration Layer
6.1.2.4. System Coordination Layer
6.1.3. Benefits of Serverless Computing
6.1.4. Challenges of Serverless Computing
6.1.4.1. Programming and Modeling
6.1.4.2. Pricing and Cost Prediction
6.1.4.3. Scheduling
6.1.4.4. Intra-Communications of Functions
6.1.4.5. Data Caching
6.1.4.6. Security and Privacy
6.2. SERVERLESS MACHINE LEARNING
6.2.1. Introduction
6.2.2. Machine Learning and Data Management
6.2.3. Training Large Models in Serverless Computing
6.2.3.1. Data Transfer and Parallelism in Serverless Computing
6.2.3.2. Data Parallelism for Model Training in Serverless Computing
6.2.3.3. Optimizing Parallelism Structure in Serverless Training
6.2.4. Cost-Efficiency in Serverless Computing
6.3. CHAPTER SUMMARY
CHAPTER 7: System-Level Design: From Standalone to Clusters
7.1. STALENESS-AWARE PIPELINING
7.1.1. Data Parallelism
7.1.2. Model Parallelism
7.1.2.1. Linear Models
7.1.2.2. Non-Linear Neural Networks
7.1.3. Hybrid Parallelism
7.1.4. Extension of Training Parallelism
7.1.5. Summary
7.2. INTRODUCTION TO FEDERATED LEARNING
7.3. TRAINING WITH NON-IID DATA
7.3.1. The Definition of Non-IID Data
7.3.2. Enabling Technologies for Non-IID Data
7.3.2.1. Data Sharing
7.3.2.2. Robust Aggregation Methods
7.3.2.3. Other Optimized Methods
7.4. LARGE-SCALE COLLABORATIVE LEARNING
7.4.1. Parameter Server
7.4.2. Decentralized P2P Scheme
7.4.3. Collective Communication-Based AllReduce
7.4.4. Data Flow-Based Graph
7.5. PERSONALIZED LEARNING
7.5.1. Data-Based Approaches
7.5.2. Model-Based Approaches
7.5.2.1. Single Model-Based Methods
7.5.2.2. Multiple Model-Based Methods
7.6. PRACTICE ON FL IMPLEMENTATION
7.6.1. Prerequisites
7.6.2. Data Distribution
7.6.3. Local Model Training
7.6.4. Global Model Aggregation
7.6.5. A Simple Example
7.7. CHAPTER SUMMARY
CHAPTER 8: Application: Image-Based Visual Perception
8.1. IMAGE CLASSIFICATION
8.1.1. Traditional Image Classification Methods
8.1.2. Deep Learning-Based Image Classification Methods
8.1.3. Conclusion
8.2. IMAGE RESTORATION AND SUPER-RESOLUTION
8.2.1. Overview
8.2.2. A Unified Framework for Image Restoration and Super-Resolution
8.2.3. A Demo of Single Image Super-Resolution
8.2.3.1. Networks Architecture
8.2.3.2. Local Aware Attention
8.2.3.3. Global Aware Attention
8.2.3.4. LARD Block
8.3. SELF-ATTENTION AND VISION TRANSFORMERS
8.4. ENVIRONMENT PERCEPTION: IMAGE SEGMENTATION AND OBJECT DETECTION
8.4.1. Object Detection
8.4.1.1. Traditional Object Detection Model
8.4.1.2. Deep Learning-Based Object Detection Model
8.4.2. Image Segmentation
8.4.2.1. Semantic Segmentation
8.4.2.2. Instance Segmentation
8.4.2.3. Panoramic Segmentation
8.5. CHAPTER SUMMARY
CHAPTER 9: Application: Video-Based Real-Time Processing
9.1. VIDEO RECOGNITION: EVOLVING FROM IMAGES
9.1.1. Challenges
9.1.2. Methodologies
9.1.2.1. Two-Stream Networks
9.1.2.2. 3D CNNs
9.2. MOTION TRACKING: LEARN FROM TIME-SPATIAL SEQUENCES
9.2.1. Deep Learning-Based Tracking
9.2.2. Optical Flow-Based Tracking
9.3. POSE ESTIMATION: KEY POINT EXTRACTION
9.3.1. 2D-Based Extraction
9.3.1.1. Single Person Estimation
9.3.1.2. Multiple Human Estimation
9.3.2. 3D-Based Extraction
9.4. PRACTICE: REAL-TIME MOBILE HUMAN POSE TRACKING
9.4.1. Prerequisites and Data Preparation
9.4.2. Hyper-Parameter Configuration and Model Training
9.4.3. Realistic Inference and Performance Evaluation
9.5. CHAPTER SUMMARY
CHAPTER 10: Application: Privacy, Security, Robustness and Trustworthiness in Edge AI
10.1. PRIVACY PROTECTION METHODS
10.1.1. Homomorphic Encryption-Enabled Methods
10.1.2. Differential Privacy-Enabled Methods
10.1.3. Secure Multi-Party Computation
10.1.4. Lightweight Private Computation Techniques for Edge AI
10.1.4.1. Example 1: Lightweight and Secure Decision Tree Classification
10.1.4.2. Example 2: Lightweight and Secure SVM Classification
10.2. SECURITY AND ROBUSTNESS
10.2.1. Practical Issues
10.2.2. Backdoor Attacks
10.2.3. Backdoor Defences
10.3. TRUSTWORTHINESS
10.3.1. Blockchain and Swarm Learning
10.3.2. Trusted Execution Environment and Federated Learning
10.4. CHAPTER SUMMARY
Bibliography
Index