This book explores new methods, architectures, tools, and algorithms for Artificial Intelligence Hardware Accelerators. The authors have structured the material to simplify readers’ journey toward understanding the aspects of designing hardware accelerators, complex AI algorithms, and their computational requirements, along with the multifaceted applications. Coverage focuses broadly on the hardware aspects of training, inference, mobile devices, and autonomous vehicles (AVs) based AI accelerators
Author(s): Ashutosh Mishra, Jaekwang Cha, Hyunbin Park, Shiho Kim
Publisher: Springer
Year: 2023
Language: English
Pages: 357
City: Cham
Preface
Contents
Artificial Intelligence Accelerators
1 Introduction
1.1 Introduction to Artificial Intelligence (AI)
1.1.1 AI Applications
1.1.2 AI Algorithms
1.2 Hardware Accelerators
2 Requirements of AI Accelerators
2.1 Hardware Accelerator Designs
2.2 Domain-Specific Accelerators
2.3 Performance Metrics in Accelerators
2.3.1 Instructions Per Second (IPS)
2.3.2 Floating Point Operations Per Second (FLOPS, flops, or flop/s)
2.3.3 Trillion/Tera of Operations Per Second (TOPS)
2.3.4 Throughput Per Cost (Throughput/$)
2.4 Key Metrics and Design Objectives
3 Classifications of AI Accelerators
4 Organization of this Book
5 Popular Design Approaches in AI Acceleration
6 Bottleneck of AI Accelerator and In-Memory Processing
7 A Few State-of-the-Art AI Accelerators
8 Conclusions
References
AI Accelerators for Standalone Computer
1 Introduction to Standalone Compute
2 Hardware Accelerators for Standalone Compute
2.1 Inference and Training of DNNs
2.2 Accelerating DNN Computation
2.3 Considerations in Hardware Design
2.4 Deep Learning Frameworks
3 Hardware Accelerators in GPU
3.1 History and Overview
3.2 GPU Architecture
3.3 GPU Acceleration Techniques
3.4 CUDA-Related Libraries
4 Hardware Accelerators in NPU
4.1 History and Overview: Hardware
4.2 Standalone Accelerating System Characteristics
4.3 Architectures of Hardware Accelerator in NPU
4.4 SOTA Architectures
5 Summary
References
AI Accelerators for Cloud and Server Applications
1 Introduction
2 Background
3 Hardware Accelerators in Clouds
4 Hardware Accelerators in Data Centers
4.1 Design of HW Accelerator for Data Centers
4.1.1 Batch Processing Applications
4.1.2 Streaming Processing Applications
4.2 Design Consideration for HW Accelerators in the Data Center
4.2.1 HW Accelerator Architecture
4.2.2 Programmable HW Accelerators
4.2.3 AI Design Ecosystem
4.2.4 Hardware Accelerator IPs
4.2.5 Energy and Power Efficiency
5 Heterogeneous Parallel Architectures in Data Centers and Cloud
5.1 Heterogeneous Computing Architectures in Data Centers and Cloud
6 Hardware Accelerators for Distributed In-Network and Edge Computing
6.1 HW Accelerator Model for In-Network Computing
6.2 HW Accelerator Model for Edge Computing
7 Infrastructure for Deploying FPGAs
8 Infrastructure for Deploying ASIC
8.1 Tensor Processing Unit (TPU) Accelerators
8.2 Cloud TPU
8.3 Edge TPU
9 SOTA Architectures for Cloud and Edge
9.1 Advances in Cloud and Edge Accelerator
9.1.1 Cloud TPU System Architecture
9.1.2 Cloud TPU VM Architecture
9.2 Staggering Cost of Training SOTA AI Models
10 Security and Privacy Issues
11 Summary
References
Overviewing AI-Dedicated Hardware for On-Device AI in Smartphones
1 Introduction
2 Overview of HW Development to Achieve On-Device AI in a Smartphone
2.1 The Development of SoC
2.2 The Development of SIMD Processor: CPU, GPU, DSP, and NPU
3 Overview of NPU and Review of Commercial NPU
3.1 Overview of NPU
3.2 NPU Architectures of Global AP Vendors
4 AI Acceleration of Non-NPU: CPU Machine Learning Coprocessor, DSP, GPU
5 Techniques for On-Device Inference: Re-architect Network, Quantization, Pruning, and Data Compression
6 Discussion
7 Conclusion
References
Software Overview for On-Device AI and ML Benchmark in Smartphones
1 Introduction
2 Google's Android NNAPI
3 Qualcomm's SNPE SDK
4 ML Benchmarks for On-Device Inference in Smartphone
5 Summary and Discussion
References
Hardware Accelerators in Embedded Systems
1 Introduction
1.1 Introduction to Embedded Systems
1.2 What Are the Components of the Embedded System?
1.3 Types of Embedded Systems
2 Hardware Accelerating in Embedded Systems
2.1 Current Issues in Embedded Systems
2.2 Commercial Options for Hardware Accelerators
3 Recent Trends of Hardware Accelerators in Embedded Systems
3.1 General Purpose Graphics Processing Unit
3.2 NPU with CPUs and GPUs
4 Conclusion
References
Application-Specific and Reconfigurable AI Accelerator
1 Introduction
2 FPGA Platform for AI Acceleration
2.1 FPGA Architecture
2.1.1 LUTs
2.1.2 Flip-Flops
2.1.3 DSP Blocks
2.1.4 Embedded Block Memory
2.1.5 Overall Architecture
2.1.6 AI-Optimized FPGA
2.2 FPGA Tools and Design Flow
2.3 AI Accelerators on FPGA Platform
3 ASIC Platform for AI Acceleration
3.1 ASIC Tools and Design Flow
3.2 Comparison Between FPGA and ASIC Design
3.3 AI Accelerators on ASIC Platform
4 Architecture and Datapath of AI Accelerator
4.1 AI Accelerator Architecture and Datapath
4.2 Optimizations for AI Accelerators
4.2.1 Computation Module
4.2.2 Memory Module
4.2.3 Control Module
4.3 Comparison Between FPGA- and ASIC-Based AI Accelerators
5 Software and Hardware Co-design for AI Accelerator
5.1 AI Model Compression
5.2 Reduced-Precision Computation
6 Hardware-Aware Neural Architecture Search
6.1 Neural Architecture Search Overview
6.2 Hardware-Aware Neural Architecture Search
6.2.1 Search Space
6.2.2 Evaluation Metrics
7 Summary
References
Neuromorphic Hardware Accelerators
1 Introduction to Neuromorphic Computing
1.1 Neural-Inspired Computing
1.2 von Neumann Architecture
1.3 Need for Neuromorphic Computing
2 Building Blocks of Neuromorphic Computing
2.1 Spiking Neural Networks
2.2 Memristors
3 Neuromorphic Hardware Accelerators
3.1 Overview
3.2 Darwin Neural Processing Unit
3.2.1 Darwin NPU Architecture
3.2.2 Performance Evaluation
3.3 Neurogrid System
3.3.1 Neurogrid Architecture
3.3.2 Performance Evaluation
3.4 SpiNNaker System
3.4.1 SpiNNaker Architecture
3.4.2 Performance Evaluation
3.5 IBM TrueNorth
3.5.1 TrueNorth Architecture
3.5.2 Performance Evaluation
3.6 BrainscaleS-2 System
3.7 Intel Loihi
3.8 Applications of Neuromorphic Hardware Accelerators
4 Analog Computing in NHAs
4.1 Overview
4.2 Magnetoresistive Random Access Memory
4.3 MRAM-Based In-Memory Computing
5 Conclusions
References
Hardware Accelerators for Autonomous Vehicles
1 Introduction
1.1 Overview
1.2 Sensors
1.3 Technologies
1.4 Electrical/Electronic (E/E) Architecture
2 Prerequisites for AI Accelerators in AVs
2.1 Overview
2.2 Requirements of AI Accelerators in AVs
2.3 Standards for AI Accelerators in AVs
2.3.1 IEC 61508
2.3.2 ISO 26262
2.3.3 ISO 21448
2.3.4 Standards Compliance for the Products Liability
2.3.5 UN R155
3 Recent AI Accelerators for AVs
3.1 Overview
3.2 Industrial Trend
3.3 Commercialized Products
4 Conclusion
References
CNN Hardware Accelerator Architecture Design for Energy-Efficient AI
1 Introduction
2 CNN Architecture Analysis: An Energy-Efficient Point of View
2.1 Overview of CNN Structure
2.2 Convolution Layer Implementation
2.3 Difference Between Training and Inference
2.4 Categorization of CNN HW Accelerator Architectures
3 Design Consideration for Energy-Efficient CNN Hardware Accelerator Implementation
3.1 Metrics for Deep Learning HW Accelerator Assessment
3.2 Dataflow
3.2.1 Reduce Memory Access Time
3.2.2 Reduce Memory Footprint
3.3 MAC
4 Energy-Efficient CNN Implementation
4.1 Approximation
4.1.1 Pruning
4.1.2 Reduced Precision and Quantization
4.1.3 Convolution Filter Decomposition
4.1.4 Alternative Operations
4.2 Optimization
4.2.1 Data Reuse
4.2.2 Computation Reduction
4.2.3 GEneral Matrix Multiplication-Based Convolution
4.2.4 FFT, Strassen, and Winograd
5 Conclusion
References