Ascend AI Processor Architecture and Programming: Principles and Applications of CANN

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Ascend AI Processor Architecture and Programming: Principles and Applications of CANN offers in-depth AI applications using Huawei’s Ascend chip, presenting and analyzing the unique performance and attributes of this processor. The title introduces the fundamental theory of AI, the software and hardware architecture of the Ascend AI processor, related tools and programming technology, and typical application cases. It demonstrates internal software and hardware design principles, system tools and programming techniques for the processor, laying out the elements of AI programming technology needed by researchers developing AI applications. Chapters cover the theoretical fundamentals of AI and deep learning, the state of the industry, including the current state of Neural Network Processors, deep learning frameworks, and a deep learning compilation framework, the hardware architecture of the Ascend AI processor, programming methods and practices for developing the processor, and finally, detailed case studies on data and algorithms for AI.

Author(s): Xiaoyao Liang
Publisher: Elsevier Science
Year: 2020

Language: English
Tags: ascend, huawei

Front matter
Copyright
About the Author
Preface
Theoretical basis
Brief history of artificial intelligenceArtificial Intelligence
Birth of AI
Set sail
Encounter bottlenecks
Moving on again
Dawn rise
Introduction to deep learningDeep learninghistory
History
ApplicationsDeep learningapplications
Future challengesDeep learningFuture Challenges
Neural network theoryNeural networktheory
Neuron model
Perceptron
Multilayer perceptron
Working principle
Computational complexity
Convolutional neural networkConvolutional Neural Network
Introduction to network architecture
Convolutional layer
Properties of the convolution layer
Implementation of the convolutional layer
Pooling layer
Fully connected layer
Optimization and acceleration
Synaptic parallelism
Neuron parallelism
Input feature map parallelism
Output feature map parallelism
Batch parallelism
Application example
Initial setup
Training
Test
References
Industry background
Current status of the neural network processorsNeural networkprocessors
CPUNeural networkprocessorsCPU
GPUNeural networkprocessorsGPU
TPUNeural networkprocessorsTPU
FPGANeural networkprocessorsFPGA
Ascend AI processorAscend AI processorintroductionNeural networkprocessorsAscend AI processor
Neural network processor acceleration theoryNeural networkacceleration theory
GPU acceleration theory
GPU computing neural networks principles
Modern GPU architecture
TPU acceleration theory
Systolic array for neural networks computing
Google TPU architecture
Deep learning frameworkDeep learningframework
MindSporeDeep learningframeworkMindSpore
CaffeDeep learningframeworkCaffe
TensorFlowDeep learningframeworkTensorFlow
PyTorchDeep learningframeworkPyTorch
Deep learning compilation frameworkDeep learningcompilation framework-TVM
Hardware architecture
Hardware architecture overview
DaVinci architectureAscend AI processorDaVinci Architecture
Computing unit
Cube unit
Vector unit
Scalar unit
Memory system
Memory unit
Data flow
Control units
Instruction set design
Scalar instruction set
Vector instruction set
Matrix instruction set
Convolution acceleration principleAscend AI processoracceleration principle
Convolution acceleration
Architecture comparison
Software architecture
Ascend AI software stack overview
L3 application enabling layer
L2 execution framework layer
L1 processor enabling layer
L0 computing resource layer
Tool chain
Neural network software flowAscend AI processorsoftware flow
Process orchestratorAscend AI processorsoftware flow Process orchestrator (matrix)
Functions
Application scenarios
Acceleration card form
Developer board form
Digital vision pre-processingAscend AI processorsoftware flow DVPP module
Functional architecture
Preprocessing mechanisms
Tensor boost engineAscend AI processorsoftware flow TBE
Functional framework
Application scenarios
RuntimeAscend AI processorsoftware flow Runtime
Task schedulerAscend AI processorsoftware flow Task scheduler
Functions
Scheduling process
Framework managerAscend AI processorsoftware flow Framework manager
Functional framework
Offline model generation
Parsing
Quantification
Compilation
Serialization
Loading offline model
Offline model inference
Application of neural network software flow
Development tool chainAscend AI processordevelopment tool
Introduction to functions
Functional framework
Tool functions
Programming methods
Basics of deep learning development
Deep learning programming theoryDeep learningprogramming theory
Declarative programming and imperative programming
Caffe
TensorFlow
Metaprogramming
Domain-specific language
TensorFlow and domain-specific language
Turing completeness
Branch and jump in TensorFlow
Collaboration between TensorFlow and python
Imperative programming and dynamic graph mechanism of PyTorch
Difference between training and inference computing graph
Model saving and loading
Model conversion
Deep learning inference engine and computing graph
Deep learning inference optimization principle
Inference process in the traditional deep learning framework
Computing graph optimization
Optimize the memory allocation mechanism
Specify appropriate batch size
Reuse memory by fine memory management
Eliminate the control logic in the model as much as possible
Use parallel computing in the model properly
A magic optimization tool-Kernel fusion
Kernel function optimization
Loop expansion and partitioned matrix
Optimization of the order of data storage
Dedicated kernel
Deep learning inference engineDeep learninginference engine
The first generation of computing graph intermediate representation of TVM-NNVM
The TVM second generation of computing graph intermediate representation-Relay
Graph optimization in TVM
Low-level intermediate representation of TVM
Basis of the low-level intermediate representation of TVM-Halide
Features of low-level intermediate representation of TVM
TVM operator implementation
TVM operator scheduling
Optimization method for TVM low-level intermediate representation
TOPI mechanism
Code generation in TVM
Techniques of Ascend AI software stackAscend AI processordevelopment phases
Model generation phase
Model parsing
Intermediate representation in the computational graph
Memory allocation
Memory layout
Kernel fusion
Operator support
Offline model file
Application compilation and deployment phase
Computing engine
Application development based on multiple devices and multiple processors
Operator scheduling
Heterogeneous computing system
Operator implementation-common kernel and customized kernel
Customized operator developmentAscend AI processoroperator development
Development procedure
Development motivation
Development process
Customized operator development
Development using the TVM primitives
Domain-specific language development
Customized plug-in development
Load the plug-in for model conversion
AI CPU operator development
Features of the reduction operator
Parameter parsing
Input and output parameters
Create a customized operator project
Operator logic development
operator.cce
op_attr.h
Reduction.h
Reduction.cce
Operator plug-in development
Operator registration
ReductionParseParams: This function is used for parsing operator parameters
InferShapeAndTypeFn: This function is used for inferring output shapes and types
UpdateOpDescFn: This function is used to update the operator description
AI Core operator development
Reduction operator features
Create an operator project
Operator logic development
Import the Python module
Definition of the operator function
Operator logic implementation
Operator scheduling and compilation
Operator plug-in development
Operator registration
TEBinBuildFn: This function is used to compile operators
Customized application development
Development motivations
Development process
Serial configuration of the computing engine
Computing engine development
The computing engine is connected in series
Case studies
Evaluation criteria
Accuracy
IoU
Mean average precision
Throughput and latency
Energy efficiency ratio
Image classification
Dataset: ImageNet
Algorithm: ResNet18
Model migration practice
Model preparation
Image processing
Project compilation
Analysis of results
Quantification methodology
Object detection
Dataset: COCO
Algorithm: YoloV3
Customized operator practice
Development of plugin
Implementation of the operator
Matrix operation based on Cube Unit
Definition of convolution operator parameters and data arrangement
Example of the Convolutional operator
Format conversion of the input feature map
Format conversion of the weight data
Format conversion of the output feature map
Performance analysis
Performance improvement tips
Index