Deep Learning has achieved impressive results in image classification, computer vision, and natural language processing. To achieve better performance, deeper and wider networks have been designed, which increase the demand for computational resources. The number of floatingpoint operations (FLOPs) has increased dramatically with larger networks, and this has become an obstacle for convolutional neural networks (CNNs) being developed for mobile and embedded devices. In this context, Binary Neural Networks: Algorithms, Architectures, and Applications will focus on CNN compression and acceleration, which are important for the research community. We will describe numerous methods, including parameter quantization, network pruning, low-rank decomposition, and knowledge distillation.
More recently, to reduce the burden of handcrafted architecture design, neural architecture search (NAS) has been used to automatically build neural networks by searching over a vast architecture space. Our book will also introduce NAS and its superiority and state-of-the-art performance in various applications, such as image classification and object detection. We also describe extensive applications of compressed deep models on image classification, speech recognition, object detection, and tracking. These topics can help researchers better understand the usefulness and the potential of network compression on practical applications. Moreover, interested readers should have basic knowledge of Machine Learning and Deep Learning to better understand the methods described in this book.
Deep Learning has become increasingly important because of its superior performance. Still, it suffers from a large memory footprint and high computational cost, making it difficult to deploy on front-end devices. For example, in unmanned systems, UAVs serve as computing terminals with limited memory and computing resources, making it difficult to perform real-time data processing based on convolutional neural networks (CNNs). To improve storage and computation efficiency, BNNs have shown promise for practical applications. BNNs are neural networks where the weights are binarized. 1-bit CNNs are a highly compressed version of BNNs that binarize both the weights and the activations to decrease the model size and computational cost. These highly compressed models make them suitable for front-end computing. In addition to these two, other quantizing neural networks, such as pruning and sparse neural networks, are widely used in edge computing.
Key Features:
• Reviews recent advances in CNN compression and acceleration
• Elaborates recent advances on binary neural network (BNN) technologies
• Introduces applications of BNN in image classification, speech recognition, object detection, and more
Author(s): Baochang Zhang, Sheng Xu, Mingbao Lin
Publisher: CRC Press
Year: 2023
Language: English
Pages: 218
Cover
Half Title
Series Page
Title Page
Copyright Page
Dedication
Contents
About the Authors
1. Introduction
1.1. Principal Methods
1.1.1. Early Binary Neural Networks
1.1.2. Gradient Approximation
1.1.3. Quantization
1.1.4. Structural Design
1.1.5. Loss Design
1.1.6. Neural Architecture Search
1.1.7. Optimization
1.2. Applications
1.2.1. Image Classification
1.2.2. Speech Recognition
1.2.3. Object Detection and Tracking
1.2.4. Applications
1.3. Our Works on BNNs
2. Quantization of Neural Networks
2.1. Overview of Quantization
2.1.1. Uniform and Non-Uniform Quantization
2.1.2. Symmetric and Asymmetric Quantization
2.2. LSQ: Learned Step Size Quantization
2.2.1. Notations
2.2.2. Step Size Gradient
2.2.3. Step Size Gradient Scale
2.2.4. Training
2.3. Q-ViT: Accurate and Fully Quantized Low-Bit Vision Transformer
2.3.1. Baseline of Fully Quantized ViT
2.3.2. Performance Degeneration of Fully Quantized ViT Baseline
2.3.3. Information Rectification in Q-Attention
2.3.4. Distribution Guided Distillation Through Attention
2.3.5. Ablation Study
2.4. Q-DETR: An Efficient Low-Bit Quantized Detection Transformer
2.4.1. Quantized DETR Baseline
2.4.2. Challenge Analysis
2.4.3. Information Bottleneck of Q-DETR
2.4.4. Distribution Rectification Distillation
2.4.5. Ablation Study
3. Algorithms for Binary Neural Networks
3.1. Overview
3.2. BNN: Binary Neural Network
3.3. XNOR-Net: Imagenet Classification Using Binary Convolutional Neural Networks
3.4. MCN: Modulated Convolutional Network
3.4.1. Forward Propagation with Modulation
3.4.2. Loss Function of MCNs
3.4.3. Back-Propagation Updating
3.4.4. Parameters Evaluation
3.4.5. Model Effect
3.5. PCNN: Projection Convolutional Neural Networks
3.5.1. Projection
3.5.2. Optimization
3.5.3. Theoretical Analysis
3.5.4. Projection Convolutional Neural Networks
3.5.5. Forward Propagation Based on Projection Convolution Layer
3.5.6. Backward Propagation
3.5.7. Progressive Optimization
3.5.8. Ablation Study
3.6. RBCN: Rectified Binary Convolutional Networks with Generative Adversarial Learning
3.6.1. Loss Function
3.6.2. Learning RBCNs
3.6.3. Network Pruning
3.6.4. Ablation Study
3.7. BONN: Bayesian Optimized Binary Neural Network
3.7.1. Bayesian Formulation for Compact 1-Bit CNNs
3.7.2. Bayesian Learning Losses
3.7.3. Bayesian Pruning
3.7.4. BONNs
3.7.5. Forward Propagation
3.7.6. Asynchronous Backward Propagation
3.7.7. Ablation Study
3.8. RBONN: Recurrent Bilinear Optimization for a Binary Neural Network
3.8.1. Bilinear Model of BNNs
3.8.2. Recurrent Bilinear Optimization
3.8.3. Discussion
3.8.4. Ablation Study
3.9. ReBNN: Resilient Binary Neural Network
3.9.1. Problem Formulation
3.9.2. Method
3.9.3. Ablation Study
4. Binary Neural Architecture Search
4.1. Background
4.2. ABanditNAS: Anti-Bandit for Neural Architecture Search
4.2.1. Anti-Bandit Algorithm
4.2.2. Search Space
4.2.3. Anti-Bandit Strategy for NAS
4.2.4. Adversarial Optimization
4.2.5. Analysis
4.3. CP-NAS: Child-Parent Neural Architecture Search for 1-bit CNNs
4.3.1. Child-Parent Model for Network Binarization
4.3.2. Search Space
4.3.3. Search Strategy for CP-NAS
4.3.4. Optimization of the 1-Bit CNNs
4.3.5. Ablation Study
4.4. DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-Bit CNNs
4.4.1. Preliminary
4.4.2. Redefine Child-Parent Framework for Network Binarization
4.4.3. Search Space
4.4.4. Tangent Propagation for DCP-NAS
4.4.5. Generalized Gauss-Newton Matrix (GGN) for Hessian Matrix
4.4.6. Decoupled Optimization for Training the DCP-NAS
4.4.7. Ablation Study
5. Applications in Natural Language Processing
5.1. Background
5.1.1. Quantization-Aware Training (QAT) for Low-Bit Large Language Models
5.1.2. Post-Training Quantization (PTQ) for Low-Bit Large Language Models
5.1.3. Binary BERT Pre-Trained Models
5.2. Fully Quantized Transformer for Machine Translation
5.2.1. Quantization Scheme
5.2.2. What to Quantize
5.2.3. Tensor Bucketing
5.2.4. Dealing with Zeros
5.3. Q-BERT: Hessian-Based Ultra Low-Precision Quantization of BERT
5.3.1. Hessian-Based Mix-Precision
5.3.2. Group-Wise Quantization
5.4. I-BERT: Integer-Only BERT Quantization
5.4.1. Integer-Only Computation of GELU and Softmax
5.4.2. Integer-Only Computation of LayerNorm
5.5. Toward Efficient Post-Training Quantization of Pre-Trained Language Models
5.5.1. Module-Wise Reconstruction Error Minimization
5.5.2. Model Parallel Strategy
5.5.3. Annealed Teacher Forcing
5.6. Outlier Suppression: Pushing the Limit of Low-Bit Transformer Language Models
5.6.1. Analysis
5.6.2. Gamma Migration
5.6.3. Token-Wise Clipping
5.7. BinaryBERT: Pushing the Limit of BERT Quantization
5.7.1. Ternary Weight Splitting
5.7.2. Knowledge Distillation
5.8. BEBERT: Efficient and Robust Binary Ensemble BERT
5.9. BiBERT: Accurate Fully Binarized BERT
5.9.1. Bi-Attention
5.9.2. Direction-Matching Distillation
5.10. BiT: Robustly Binarized Multi-Distilled Transformer
5.10.1. Two-Set Binarization Scheme
5.10.2. Elastic Binarization Function
5.10.3. Multi-Distilled Binary BERT
5.11. Post-Training Embedding Binarization for Fast Online Top-K Passage Matching
5.11.1. Semantic Diffusion
5.11.2. Gradient Estimation
6. Applications in Computer Vision
6.1. Introduction
6.1.1. Person Re-Identification
6.1.2. 3D Point Cloud Processing
6.1.3. Object Detection
6.1.4. Speech Recognition
6.2. BiRe-ID: Binary Neural Network for Efficient Person Re-ID
6.2.1. Problem Formulation
6.2.2. Kernel Refining Generative Adversarial Learning (KR-GAL)
6.2.3. Feature Refining Generative Adversarial Learning (FR-GAL)
6.2.4. Optimization
6.2.5. Ablation Study
6.3. POEM: 1-Bit Point-Wise Operations Based on E-M for Point Cloud Processing
6.3.1. Problem Formulation
6.3.2. Binarization Framework of POEM
6.3.3. Supervision for POEM
6.3.4. Optimization for POEM
6.3.5. Ablation Study
6.4. LWS-Det: Layer-Wise Search for 1-bit Detectors
6.4.1. Preliminaries
6.4.2. Formulation of LWS-Det
6.4.3. Differentiable Binarization Search for the 1-Bit Weight
6.4.4. Learning the Scale Factor
6.4.5. Ablation Study
6.5. IDa-Det: An Information Discrepancy-Aware Distillation for 1-bit Detectors
6.5.1. Preliminaries
6.5.2. Select Proposals with Information Discrepancy
6.5.3. Entropy Distillation Loss
6.5.4. Ablation Study
Bibliography
Index