Energy efficiency is critical for running computer vision on battery-powered systems, such as mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the methods that have won the annual IEEE Low-Power Computer Vision Challenges since 2015. The winners share their solutions and provide insight on how to improve the efficiency of machine learning systems.
Author(s): George K. Thiruvathukal, Yung-Hsiang Lu, Jaeyoun Kim, Yiran Chen, Bo Chen
Publisher: CRC Press
Year: 2022
Language: English
Pages: 344
City: Boca Raton
Cover
Half Title
Title Page
Copyright Page
Contents
Foreword
Rebooting Computing and Low-Power Computer Vision
Editors
SECTION I: Introduction
CHAPTER 1: Book Introduction
1.1. ABOUT THE BOOK
1.2. CHAPTER SUMMARIES
1.2.1. History of Low-Power Computer Vision Challenge
1.2.2. Survey on Energy-Efficient Deep Neural Networks for Computer Vision
1.2.3. Hardware Design and Software Practices for Efficient Neural Network Inference
1.2.4. Progressive Automatic Design of Search Space for One-Shot Neural Architecture
1.2.5. Fast Adjustable Threshold for Uniform Neural Network Quantization
1.2.6. Power-efficient Neural Network Scheduling on Heterogeneous system on chips (SoCs)
1.2.7. Efficient Neural Architecture Search
1.2.8. Design Methodology for Low-Power Image Recognition Systems Design
1.2.9. Guided Design for Efficient On-device Object Detection Model
1.2.10. Quantizing Neural Networks for Low-Power Computer Vision
1.2.11. A Practical Guide to Designing Efficient Mobile Architectures
1.2.12. A Survey of Quantization Methods for Efficient Neural Network Inference
CHAPTER 2: History of Low-Power Computer Vision Challenge
2.1. REBOOTING COMPUTING
2.2. LOW-POWER IMAGE RECOGNITION CHALLENGE (LPIRC): 2015–2019
2.3. LOW-POWER COMPUTER VISION CHALLENGE (LPCVC): 2020
2.4. WINNERS
2.5. ACKNOWLEDGMENTS
CHAPTER 3: Survey on Energy-Efficient Deep Neural Networks for Computer Vision
3.1. INTRODUCTION
3.2. BACKGROUND
3.2.1. Computation Intensity of Deep Neural Networks
3.2.2. Low-Power Deep Neural Networks
3.3. PARAMETER QUANTIZATION
3.4. DEEP NEURAL NETWORK PRUNING
3.5. DEEP NEURAL NETWORK LAYER AND FILTER COMPRESSION
3.6. PARAMETER MATRIX DECOMPOSITION TECHNIQUES
3.7. NEURAL ARCHITECTURE SEARCH
3.8. KNOWLEDGE DISTILLATION
3.9. ENERGY CONSUMPTION—ACCURACY TRADEOFF WITH DEEP NEURAL NETWORKS
3.10. GUIDELINES FOR LOW-POWER COMPUTER VISION
3.10.1. Relationship between Low-Power Computer Vision Techniques
3.10.2. Deep Neural Network and Resolution Scaling
3.11. EVALUATION METRICS
3.11.1. Accuracy Measurements on Popular Datasets
3.11.2. Memory Requirement and Number of Operations
3.11.3. On-device Energy Consumption and Latency
3.12. SUMMARY AND CONCLUSIONS
SECTION II: Competition Winners
CHAPTER 4: Hardware Design and Software Practices for Efficient Neural Network Inference
4.1. HARDWARE AND SOFTWARE DESIGN FRAMEWORK FOR EFFICIENT NEURAL NETWORK INFERENCE
4.1.1. Introduction
4.1.2. From Model to Instructions
4.2. ISA-BASED CNN ACCELERATOR: ANGEL-EYE
4.2.1. Hardware Architecture
4.2.2. Compiler
4.2.3. Runtime Workflow
4.2.4. Extension Support of Upsampling Layers
4.2.5. Evaluation
4.2.6. Practice on DAC-SDC Low-Power Object Detection Challenge
4.3. NEURAL NETWORK MODEL OPTIMIZATION
4.3.1. Pruning and Quantization
4.3.1.1. Network Pruning
4.3.1.2. Network Quantization
4.3.1.3. Evaluation and Practices
4.3.2. Pruning with Hardware Cost Model
4.3.2.1. Iterative Search-based Pruning Methods
4.3.2.2. Local Programming-based Pruning and the Practice in LPCVC’19
4.3.3. Architecture Search Framework
4.3.3.1. Framework Design
4.3.3.2. Case Study Using the aw nas Framework: Black-box Search Space Tuning for Hardware-aware NAS
4.4. SUMMARY
CHAPTER 5: Progressive Automatic Design of Search Space for One-Shot Neural Architecture Search
5.1. ABSTRACT
5.2. INTRODUCTION
5.3. RELATED WORK
5.4. METHOD
5.4.1. Problem Formulation and Motivation
5.4.2. Progressive Automatic Design of Search Space
5.5. EXPERIMENTS
5.5.1. Dataset and Implement Details
5.5.2. Comparison with State-of-the-art Methods
5.5.3. Automatically Designed Search Space
5.5.4. Ablation Studies
5.6. CONCLUSION
CHAPTER 6: Fast Adjustable Threshold for Uniform Neural Network Quantization
6.1. INTRODUCTION
6.2. RELATED WORK
6.2.1. Quantization with Knowledge Distillation
6.2.2. Quantization without Fine-tuning
6.2.3. Quantization with Training/Fine-tuning
6.3. METHOD DESCRIPTION
6.3.1. Quantization with Threshold Fine-tuning
6.3.1.1. Differentiable Quantization Threshold
6.3.1.2. Batch Normalization Folding
6.3.1.3. Threshold Scale
6.3.1.4. Training of Asymmetric Thresholds
6.3.1.5. Vector Quantization
6.3.2. Training on the Unlabeled Data
6.3.3. Quantization of Depth-wise Separable Convolution
6.3.3.1. Scaling the Weights for MobileNet-V2 (with ReLU6)
6.4. EXPERIMENTS AND RESULTS
6.4.1. Experiments Description
6.4.1.1. Researched Architectures
6.4.1.2. Training Procedure
6.4.2. Results
6.5. CONCLUSION
CHAPTER 7: Power-efficient Neural Network Scheduling
7.1. INTRODUCTION TO NEURAL NETWORK SCHEDULING ON HETEROGENEOUS SoCs
7.1.1. Heterogeneous SoC
7.1.2. Network Scheduling
7.2. COARSE-GRAINED SCHEDULING FOR NEURAL NETWORK TASKS: A CASE STUDY OF CHAMPION SOLUTION IN LPIRC2016
7.2.1. Introduction to the LPIRC2016. Mission and the Solutions
7.2.2. Static Scheduling for the Image Recognition Task
7.2.3. Manual Load Balancing for Pipelined Fast R-CNN
7.2.4. The Result of Static Scheduling
7.3. FINE-GRAINED NEURAL NETWORK SCHEDULING ON POWER-EFFICIENT PROCESSORS
7.3.1. Network Scheduling on SUs: Compiler-Level Techniques
7.3.2. Memory-Efficient Network Scheduling
7.3.3. The Formulation of the Layer-Fusion Problem by Computational Graphs
7.3.4. Cost Estimation of Fused Layer-Groups
7.3.5. Hardware-Aware Network Fusion Algorithm (HaNF)
7.3.6. Implementation of the Network Fusion Algorithm
7.3.7. Evaluation of Memory Overhead
7.3.8. Performance on Different Processors
7.4. SCHEDULER-FRIENDLY NETWORK QUANTIZATIONS
7.4.1. The Problem of Layer Pipelining between CPU and Integer SUs
7.4.2. Introduction to Neural Network Quantization for Integer Neural Accelerators
7.4.3. Related Work of Neural Network Quantization
7.4.4. Linear Symmetric Quantization for Low-Precision Integer Hardware
7.4.5. Making Full Use of the Pre-Trained Parameters
7.4.6. Low-Precision Representation and Quantization Algorithm
7.4.7. BN Layer Fusion of Quantized Networks
7.4.8. Bias and Scaling Factor Quantization for Low-Precision Integer Operation
7.4.9. Evaluation Results
7.5. SUMMARY
CHAPTER 8: Efficient Neural Network Architectures
8.1. STANDARD CONVOLUTION LAYER
8.2. EFFICIENT CONVOLUTION LAYERS
8.3. MANUALLY DESIGNED EFFICIENT CNN MODELS
8.4. NEURAL ARCHITECTURE SEARCH
8.5. HARDWARE-AWARE NEURAL ARCHITECTURE SEARCH
8.5.1. Latency Prediction
8.5.2. Specialized Models for Different Hardware
8.5.3. Handling Many Platforms and Constraints
8.6. CONCLUSION
CHAPTER 9: Design Methodology for Low-Power Image Recognition Systems
9.1. DESIGN METHODOLOGY USED IN LPIRC 2017
9.1.1. Object Detection Networks
9.1.2. Throughput Maximization by Pipelining
9.1.3. Software Optimization Techniques
9.1.3.1. Tucker Decomposition
9.1.3.2. CPU Parallelization
9.1.3.3. 16-bit Quantization
9.1.3.4. Post Processing
9.2. IMAGE RECOGNITION NETWORK EXPLORATION
9.2.1. Single Stage Detectors
9.2.2. Software Optimization Techniques
9.2.3. Post Processing
9.2.4. Network Exploration
9.2.5. LPIRC 2018. Solution
9.3. NETWORK PIPELINING FOR HETEROGENEOUS PROCESSOR SYSTEMS
9.3.1. Network Pipelining Problem
9.3.2. Network Pipelining Heuristic
9.3.3. Software Framework for Network Pipelining
9.3.4. Experimental Results
9.4. CONCLUSION AND FUTURE WORK
CHAPTER 10: Guided Design for Efficient On-device Object Detection Model
10.1. INTRODUCTION
10.1.1. LPIRC Track 1 in 2018. and 2019
10.1.2. Three Awards for Amazon team
10.2. BACKGROUND
10.3. AWARD-WINNING METHODS
10.3.1. Quantization Friendly Model
10.3.2. Network Architecture Optimization
10.3.3. Training Hyper-parameters
10.3.4. Optimal Model Architecture
10.3.5. Neural Architecture Search
10.3.6. Dataset Filtering
10.3.7. Non-maximum Suppression Threshold
10.3.8. Combination
10.4. CONCLUSION
SECTION III: Invited Articles
CHAPTER 11: Quantizing Neural Networks
11.1. INTRODUCTION
11.2. QUANTIZATION FUNDAMENTALS
11.2.1. Hardware Background
11.2.2. Uniform Affine Quantization
11.2.2.1. Symmetric Uniform Quantization
11.2.2.2. Power-of-two Quantizer
11.2.2.3. Quantization Granularity
11.2.3. Quantization Simulation
11.2.3.1. Batch Normalization Folding
11.2.3.2. Activation Function Fusing
11.2.3.3. Other Layers and Quantization
11.2.4. Practical Considerations
11.2.4.1. Symmetric vs. Asymmetric Quantization
11.2.4.2. Per-tensor and Per-channel Quantization
11.3. POST-TRAINING QUANTIZATION
11.3.1. Quantization Range Setting
11.3.2. Cross-Layer Equalization
11.3.3. Bias Correction
11.3.4. AdaRound
11.3.5. Standard PTQ Pipeline
11.3.6. Experiments
11.4. QUANTIZATION-AWARE TRAINING
11.4.1. Simulating Quantization for Backward Path
11.4.2. Batch Normalization Folding and QAT
11.4.3. Initialization for QAT
11.4.4. Standard QAT Pipeline
11.4.5. Experiments
11.5. SUMMARY AND CONCLUSIONS
CHAPTER 12: Building Efficient Mobile Architectures
12.1. INTRODUCTION
12.2. ARCHITECTURE PARAMETERIZATIONS
12.2.1. Network Width Multiplier
12.2.2. Input Resolution Multiplier
12.2.3. Data and Internal Resolution
12.2.4. Network Depth Multiplier
12.2.5. Adjusting Multipliers for Multi-criteria Optimizations
12.3. OPTIMIZING EARLY LAYERS
12.4. OPTIMIZING THE FINAL LAYERS
12.4.1. Adjusting the Resolution of the Final Spatial Layer
12.4.2. Reducing the Size of the Embedding Layer
12.5. ADJUSTING NON-LINEARITIES: H-SWISH AND H-SIGMOID
12.6. PUTTING IT ALL TOGETHER
CHAPTER 13: A Survey of Quantization Methods for Efficient Neural Network Inference
13.1. INTRODUCTION
13.2. GENERAL HISTORY OF QUANTIZATION
13.3. BASIC CONCEPTS OF QUANTIZATION
13.3.1. Problem Setup and Notations
13.3.2. Uniform Quantization
13.3.3. Symmetric and Asymmetric Quantization
13.3.4. Range Calibration Algorithms: Static vs. Dynamic Quantization
13.3.5. Quantization Granularity
13.3.6. Non-Uniform Quantization
13.3.7. Fine-tuning Methods
13.3.7.1. Quantization-Aware Training
13.3.7.2. Post-Training Quantization
13.3.7.3. Zero-shot Quantization
13.3.8. Stochastic Quantization
13.4. ADVANCED CONCEPTS: QUANTIZATION BELOW 8 BITS
13.4.1. Simulated and Integer-only Quantization
13.4.2. Mixed-Precision Quantization
13.4.3. Hardware Aware Quantization
13.4.4. Distillation-Assisted Quantization
13.4.5. Extreme Quantization
13.4.6. Vector Quantization
13.5. QUANTIZATION AND HARDWARE PROCESSORS
13.6. FUTURE DIRECTIONS FOR RESEARCH IN QUANTIZATION
13.7. SUMMARY AND CONCLUSIONS
Bibliography
Index