This book introduces the basic principles and implementation process of Deep Learning in a simple way, and uses Python's Numpy library to build its own Deep Learning library from scratch instead of using existing Deep Learning libraries. On the basis of introducing basic knowledge of Python programming, calculus, and probability statistics, the core basic knowledge of Deep Learning such as regression model, neural network, convolutional neural network, recurrent neural network, and generative network is introduced in sequence according to the development of Deep Learning. While analyzing the principle in a simple way, it provides a detailed code implementation process. It is like not teaching you how to use weapons and mobile phones, but teaching you how to make weapons and mobile phones by yourself. This book is not a tutorial on the use of existing Deep Learning libraries, but an analysis of how to develop Deep Learning libraries from 0. This method of combining the principle from 0 with code implementation can enable readers to better understand the basic principles of Deep Learning and the design ideas of popular Deep Learning libraries.
Through reading this book, readers can follow step by step to build a deep learning library from 0 without any deep learning platform. Finally, as a comparison, the use of the Deep Learning platform Pytorch is introduced, so that readers can easily learn to use this deep learning platform, which will help readers understand the design ideas of these platforms more deeply, so as to better grasp and use these deep learning platforms.
This book is suitable not only for beginners without any Deep Learning knowledge, but also for practitioners who have experience in using deep learning libraries and want to understand its underlying implementation principles. This book is especially suitable as a Deep Learning textbook for colleges and universities.
Author(s): Hongwei Dong
Publisher: Independently published
Year: 2023
Language: English
Pages: 664
Chapter 1 Programming and Math Fundamentals
1.1 Python quick start
1.1.1 Python installation
Python interpreter installation
jupyter notebook programming environment
Anaconda installation tool
1.1.2 Object, print() function, type conversion, comment, variable, input() function
1. Objects
2. Print function print()
3. Type conversion
4. Notes
5. Variables
6. input() function
1.1.3 Operation
[subscript operator []](#subscript-operator-)
String formatting
1.1.4 Control Statements
1. if statement
2. while statement
3. for statement
1.1.5 Python commonly used container types
1. list (list)
index
slice
for traverse all elements
2. tuple (tuple)
3. set (collection)
4. dict (dictionary)
1.1.6 Functions
math math package
Global and local variables
Anonymous/Lambda Function (anonymous/lambda function)
Nested functions, closures
yield and generators
1.1.7 Classes and Objects
1.1.8 Getting Started with Matplotlib
subplot()
Axes objects
mplot3d
display image
1.2 tensor library numpy
1.2.1 What is a tensor?
1 vector
The norm of the vector
2 Matrix
3 dimensional tensor
1.2.2 Create ndarray object
1. array()
2. Multidimensional array type ndarray
3. asarray()
4. The tolist() method of ndarray
5. astype() and reshape()
6. arange() and linspace()
7. full(), empty(), zeros(), ones(), eye()
8. Common functions for creating tensors of random values
9. Add, Repeat & Lay, Merge & Split, Edge Fill, Add Axis & Swap Axis
Repeat repeat()
laying tile()
merge concatenate()
overlay stack()
column_stack(), hstack(), vstack()
split split()
Edge Padding
Add Axis
Swap axes
1.2.3 Indexing and slicing of ndarry arrays
1.2.4 Tensor calculation
1. Element-by-element calculation
Hadamard Product
2. Cumulative calculation
3. Dot Product
4 Broadcast Broadcasting
1.3 Calculus
1.3.1 Functions
1.3.2 Four arithmetic and compound operations
Arithmetic
Composite
1.3.3 Limits, derivatives
1. The limit of the sequence
2. Limit and continuity of function
3. Derivatives of functions
1.3.4 The Four Arithmetic Operations of Derivatives and the Chain Derivation Rule
1.3.5 Calculation graph, forward calculation, backpropagation derivation
1.3.6 Partial derivatives and gradients of multivariable functions
1.3.7 Derivative of vector-valued function and Jacobian matrix
1.3.8 Integral
1.4 Probability Basics
1.4.1 Probability
1.4.2 Conditional probability, joint probability, total probability formula, Bayesian formula
1.4.3 Random variables
1.4.4 Probability distribution sequence of discrete random variables
1.4.5 Probability Density of Continuous Random Variables
1.4.6 Distribution functions of random variables
1.4.7 Expectation, variance, covariance, covariance matrix
1. Mean and Expectation
2. Variance, standard deviation
3. Covariance, covariance matrix
Chapter 2 Gradient descent method
2.1 Necessary conditions for function extremum
2.2 Gradient descent method (gradient descent)
2.3 Parameter optimization strategy of gradient descent method
2.3.1 Momentum momentum method
2.3.2 Adagrad method
2.3.3 Adadelta method
2.3.4 RMSprop method
2.3.5 Adam method
2.4 Gradient verification
2.4.1 Comparing numerical and analytical gradients
2.4.2 Generic numerical gradients
2.5 Separation gradient descent algorithm and parameter optimization strategy
2.5.1 Parameter optimizer
2.5.2 Gradient descent method accepting parameter optimizer
Chapter 3 Linear Regression, Logistic Regression and Softmax Regression
3.1 Linear regression
3.1.1 Dining car profit problem
3.1.2 Machine Learning and Artificial Intelligence
1. Machine Learning
2. The relationship between machine learning and artificial intelligence
3. Classification of machine learning
3.1.3 What is linear regression?
3.1.4 Normal equations to solve linear regression problems
3.1.5 Gradient descent method to solve linear regression problems
3.1.6 Debug learning rate
3.1.7 Gradient verification
3.1.8 Prediction
3.1.9 Linear regression with multiple features
1. Multi-feature linear regression
2. Fitting plane
3. Temperature and pressure problems
3.1.10 Normalization of data
3.2 Evaluation of the model
3.2.1 Underfitting and overfitting
3.2.2 Verification set, test set
3.2.3 Learning Curve
3.2.4 Forecasting the output of the dam
3.2.5 Bias and variance (Bias-Variance)
3.3 Regularization
- The loss function of adding the regular term becomes
3.5 Logistic regression
3.5.1 Logistic regression
3.5.2 numpy implementation of logistic regression
1. Generate data
2. Code implementation of gradient descent method
3. Calculate the loss function value
4. Decision curve
5. Prediction accuracy
6. Logistic Regression with Scikit-Learn Library
3.5.3 Actual combat: numpy implementation of iris classification
3.6 softmax regression
3.6.1 spiral data set
3.6.2 softmax function
3.6.3 softmax regression
Multi-sample form
3.6.4 Multi-classification cross-entropy loss
3.6.5 Calculate cross entropy loss by weighted sum
3.6.6 Gradient calculation of softmax regression
1. The gradient of the cross-entropy loss on the weighted sum
2. The gradient of the cross-entropy loss with respect to the weight parameter
3.6.7 Implementation of gradient descent method for softmax regression
2.6.8 Softmax regression of spiral data set
3.7 Batch Gradient Descent and Stochastic Gradient Descent
3.7.1 MNIST handwritten digit set
3.7.2 Training logistic regression with partial training samples
3.7.3 Batch Gradient Descent Method and Implementation
Softmax regression of Fasion MNIST training set
3.7.4 Stochastic Gradient Descent
Summarize
Chapter 4 Neural Networks
4.1 Neural Network
4.1.1 Perceptrons and neurons
1. Perceptron
2. Neurons
4.1.2 Activation function
1. Step function sign(x)
2. Tanh function
4. ReLU function
4.1.3 Neural Networks and Deep Learning
4.1.4 Forward calculation of multiple samples
4.1.5 Output
4.1.6 Loss function
1. Mean square error loss
2. Binary classification cross entropy loss
3. Multi-classification cross-entropy loss
4.1.7 Neural Network Training Based on Numerical Gradients
4.1.8 Deep Learning
4.2 Reverse derivation
4.2.1 Forward calculation and reverse derivation
4.2.2 Computation graph
4.2.3 The gradient of the loss function with respect to the output
1. The gradient of the binary cross-entropy loss function on the output
2. The gradient of the mean square error loss function on the output
3. The gradient of the multi-class cross entropy loss function on the output
4.2.4 Derivation of back propagation of 2-layer neural network
1. Reverse derivation of single sample
2. Multi-sample vectorized representation of reverse derivation
3. Gradient calculation formula in column vector form
4.2.5 Python implementation of 2-layer neural network
4.2.6 Derivation of backpropagation of any layer neural network
4.3 Implement a simple deep learning framework
4.3.1 Training process of neural network
4.3.2 Code implementation of the network layer
4.3.3 Gradient test of network layer
4.3.4 Neural Network Class
4.3.5 Gradient test of neural network
4.3.6 MNIST data handwritten digit recognition based on deep learning framework
4.3.7 Improved general neural network framework: separate weighted sum and activation function
Gradient Validation
4.3.8 Independent parameter optimizer
4.3.9 fashion-mnist classification training
4.3.9 Read and write model parameters
Chapter 5 Basic Techniques for Improving Neural Network Performance
5.1 Data processing
5.1.1 Data Augmentation
5.1.2 Normalization
5.1.3 Feature Engineering
1. Data dimensionality reduction and principal component analysis
2 Whitening
5.2 Parameter debugging
5.2.1 Weight initialization
5.2.2 Optimization parameters
5.3 Batch Normalization
5.3.1 What is batch normalization?
5.3.2 Reverse derivation of batch normalization
5.3.3 Code Implementation of Batch Normalization
5.4 Regularization Regularization
5.4.1 Weight regularization
5.4.2 Dropout
5.4.3 Early stopping method (Early stopping)
Chapter 6 Convolutional Neural Network CNN
6.1 Convolution
6.1.1 What is convolution?
span
6.1.2 Convolution of one-dimensional signal
6.1.3 Two-dimensional convolution
span
6.1.4 Multiple input channels and multiple output channels
6.1.5 Pooling
6.2 Convolutional Neural Network
6.2.1 Fully connected neurons and convolutional neurons
6.2.2 Convolutional Layer and Convolutional Neural Network
6.2.3 Reverse derivation and code implementation of convolutional layer and pooling layer
Reverse derivation of convolutional layer
The reverse derivation of the pooling layer
6.2.4 Implementation of convolutional neural network
6.3 Convolution matrix multiplication
6.3.1 Matrix multiplication of 1D sample convolution
6.3.2 Matrix multiplication of 2D sample convolution
6.3.3 Matrix multiplication for reverse derivation of 1D convolution
6.3.4 Matrix multiplication for reverse derivation of 2D convolution
6.4 Fast convolution based on coordinate index
Gradient Test
Time comparison with non-accelerated convolution
6.5 Typical convolutional neural network structure
6.5.1 LeNet-5
6.5.2 AlexNet
6.5.3 VGG
6.5.4 Gradient Explosion and Vanishing Problems of Deep Neural Networks
6.5.5 Residual Networks (ResNets)
6.5.6 Google Inception Network
6.5.7 Network in Network (NiN)
Chapter 7 Recurrent Neural Network RNN
7.1 Sequence problems and models
7.1.1 Stock Price Prediction Problem
7.1.2 Probabilistic sequence model, language model
1. Probabilistic sequence model
2. Language Model
7.1.3 Autoregressive model
7.1.4 Generate autoregressive data
7.1.5 Time window method
7.1.6 Time window sampling
7.1.7 Time window method modeling and training
7.1.8 Long-term forecast and short-term forecast
7.1.9 Stock Price Prediction
7.1.10 k-gram language model
7.2 Recurrent Neural Networks
7.2.1 Acyclic neural network without memory function
7.2.2 Recurrent neural network with memory function
7.3 Backpropagation through time
7.4 Implementation of single-layer recurrent neural network
7.4.1 Initialize model parameters
7.4.2 Forward calculation
7.4.3 Loss function
7.4.4 Reverse derivation
7.4.5 Gradient verification
7.4.6 Gradient descent training
7.4.7 Sampling of sequence data
7.4.8 RNN training and prediction of sequence data
Training on sequence data
predict
Training and prediction of stock data
7.5 RNN language model and text generation
7.5.1 Character table
7.5.2 Sampling of character sequence samples
7.5.3 RNN model training and prediction
predict
7.6 Gradient explosion and gradient disappearance of RNN network
7.7 Long Short-Term Memory Network (LSTM)
7.7.1 LSTM neuron: cell
7.7.2 Reverse derivation of LSTM
7.7.3 LSTM code implementation
Gradient Test
Text generation
predict
7.7.4 Variations of LSTM
7.8 Gated Recurrent Unit (GRU)
7.8.1 Working principle of GRU
7.8.2 GRU code implementation
7.9 Class Representation and Implementation of Recurrent Neural Network
7.9.1 Implementing Recurrent Neural Networks with Classes
7.9.2 Class implementation of recurrent neural network unit
7.10 Multilayer, Bidirectional Recurrent Neural Network
7.10.1 Multilayer Recurrent Neural Network
7.10.2 Training and prediction of multi-layer recurrent neural network
7.10.3 Bidirectional Recurrent Neural Network
7.11 Sequence to sequence (seq2seq) model
machine translation
7.11.1 Implementation of Seq2Seq model
7.11.2 Seq2Seq for character-level machine translation
1. Character word list
2. Read training samples and build character vocabulary
3. Training character-level Seq2Seq model
7.11.3 Seq2Seq machine translation based on Word2Vec
1. Word vectorization Word2Vec's skip-gram method
7.11.4 Seq2Seq model based on word embedding layer
1. Word embedding layer
2. Seq2Seq model using word embedding layer
7.11.5 Attention mechanism
Chapter 8 Generating Models
8.1 Generate model
8.2 Autoencoders
8.2.1 Autoencoder
8.2.2 Sparse Encoder
8.2.3 Implementation of Autoencoder
8.3 Variational Autoencoders
8.3.1 What is a variational autoencoder?
8.3.2 Loss function
8.3.3 Parameter resampling
8.3.4 Reverse Derivation
8.3.4 Implementation of Variational Autoencoder
8.4 Generating Adversarial Networks
8.4.1 Principle of GAN
1. Discriminator and Generator
2. Loss function
3. Training process
8.4.2 Code implementation of GAN training process
8.5 GAN modeling example
8.5.1 GAN modeling of a set of real numbers
1. Real data: a set of real numbers
2. Define discriminator and generator functions
3. Real data iterator, noise data iterator
4. Intermediate result drawing function
5. Training GAN
8.5.2 GAN modeling of two-dimensional coordinate points
1. Real data: coordinate points sampled on the elliptic curve
2. Real data iterator, noise iterator
3. Define the generator and discriminator of the GAN model
4. Training GAN model
8.5.3 GAN modeling of MNIST dataset
1. Read training data
2. Define the data iterator
3. Define the generator and discriminator and its optimizer
4. Training model
8.5.4 GAN training techniques
8.6 GAN loss function and its probability explanation
8.6.1 The global optimal solution of the loss function of GAN
8.6.2 Kullback–Leibler divergence and Jensen–Shannon divergence
8.6.3 Maximum Likelihood Interpretation of GAN
8.7 Improved loss function: Wasserstein GAN (WGAN)
8.7.1 Principle of Wasserstein GAN
8.7.2 WGAN code implementation
8.8 Deep convolutional confrontation network DCGAN
8.8.1 Transposed convolution of 1D vectors
8.8.2 2D transposed convolution
8.8.3 Implementation of convolutional confrontation network DCGAN