The Science of Deep Learning

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

The Science of Deep Learning emerged from courses taught by the author that have provided thousands of students with training and experience for their academic studies. prepared them for careers in deep learning, machine learning. artificial intelligence in top companies in industry and academia. The book begins by covering the foundations of deep learning, followed by key deep learning architectures. Subsequent parts on generative models and reinforcement learning may be used as part of a deep learning course or as part of a course on each topic. The book includes state of the art topics such as Transformers, graph neural networks, variational autoencoders. deep reinforcement learning, with a broad range of applications. The appendices provide equations for computing gradients in backpropagation and optimization. best practices in scientific writing and reviewing. The text presents an up to date guide to the field built upon clear visualizations using a unified notation and equations, lowering the barrier to entry for the reader. The accompanying website provides complementary code and hundreds of exercises with solutions.

Author(s): Iddo Dror
Publisher: Independently Published
Year: 2023

Language: English
Pages: 362

Cover
Half-title
Title page
Copyright information
Contents
Preface
Acknowledgments
Abbreviations and Notation
Part I Foundations
1 Introduction
1.1 Deep Learning
1.2 Outline
1.2.1 Part I: Foundations: Backpropagation, Optimization, and Regularization
1.2.2 Part II: Architectures: CNNs, RNNs, GNNs, and Transformers
1.2.3 Part III: Generative Models: GANs, VAEs, and Normalizing Flows
1.2.4 Part IV: Reinforcement Learning
1.2.5 Part V: Applications
1.2.6 Appendices
1.3 Code
1.4 Exercises
2 Forward and Backpropagation
2.1 Introduction
2.2 Fully Connected Neural Network
2.3 Forward Propagation
2.3.1 Algorithm
2.3.2 Example
2.3.3 Logistic Regression
2.4 Non-linear Activation Functions
2.4.1 Sigmoid
2.4.2 Hyperbolic Tangent
2.4.3 Rectified Linear Unit
2.4.4 Swish
2.4.5 Softmax
2.5 Loss Functions
2.6 Backpropagation
2.7 Differentiable Programming
2.8 Computation Graph
2.8.1 Example
2.8.2 Logistic Regression
2.8.3 Forward and Backpropagation
2.9 Derivative of Non-linear Activation Functions
2.10 Backpropagation Algorithm
2.10.1 Example
2.11 Chain Rule for Differentiation
2.11.1 Two Functions in One Dimension
2.11.2 Three Functions in One Dimension
2.11.3 Two Functions in Higher Dimensions
2.12 Gradient of Loss Function
2.13 Gradient Descent
2.14 Initialization and Normalization
2.15 Software Libraries and Platforms
2.16 Summary
3 Optimization
3.1 Introduction
3.2 Overview
3.2.1 Optimization Problem Classes
3.2.2 Optimization Solution Methods
3.2.3 Derivatives and Gradients
3.2.4 Gradient Computation
3.3 First-Order Methods
3.3.1 Gradient Descent
3.3.2 Step Size
3.3.3 Mini-Batch Gradient Descent
3.3.4 Stochastic Gradient Descent
3.3.5 Adaptive Gradient Descent
3.3.6 Momentum
3.3.7 Adagrad
3.3.8 Adam: Adaptive Moment Estimation
3.3.9 Hypergradient Descent
3.4 Second-Order Methods
3.4.1 Newton’s Method
3.4.2 Second-Order Taylor Approximation
3.4.3 Quasi-Newton Methods
3.5 Evolution Strategies
3.6 Summary
4 Regularization
4.1 Introduction
4.2 Generalization
4.3 Overfitting
4.4 Cross Validation
4.5 Bias and Variance
4.6 Vector Norms
4.7 Ridge Regression and Lasso
4.8 Regularized Loss Functions
4.9 Dropout Regularization
4.9.1 Random Least Squares with Dropout
4.9.2 Least Squares with Noise Input Distortion
4.10 Data Augmentation
4.11 Batch Normalization
4.12 Summary
Part II Architectures
5 Convolutional Neural Networks
5.1 Introduction
5.1.1 Representations Sharing Weights
5.2 Convolution
5.2.1 One-Dimensional Convolution
5.2.2 Matrix Multiplication
5.2.3 Two-Dimensional Convolution
5.2.4 Separable Filters
5.2.5 Properties
5.2.6 Composition
5.2.7 Three-Dimensional Convolution
5.3 Layers
5.3.1 Convolution
5.3.2 Pooling
5.4 Example
5.5 Architectures
5.6 Applications
5.7 Summary
6 Sequence Models
6.1 Introduction
6.2 Natural Language Models
6.2.1 Bag of Words
6.2.2 Feature Vector
6.2.3 N-grams
6.2.4 Markov Model
6.2.5 State Machine
6.2.6 Recurrent Neural Network
6.3 Recurrent Neural Network
6.3.1 Architectures
6.3.2 Loss Function
6.3.3 Deep RNN
6.3.4 Bidirectional RNN
6.3.5 Backpropagation Through Time
6.4 Gated Recurrent Unit
6.4.1 Update Gate
6.4.2 Candidate Activation
6.4.3 Reset Gate
6.4.4 Function
6.5 Long Short-Term Memory
6.5.1 Forget Gate
6.5.2 Input Gate
6.5.3 Memory Cell
6.5.4 Candidate Memory
6.5.5 Output Gate
6.5.6 Peephole Connections
6.5.7 GRU vs. LSTM
6.6 Sequence to Sequence
6.7 Attention
6.8 Embeddings
6.9 Introduction to Transformers
6.10 Summary
7 Graph Neural Networks
7.1 Introduction
7.2 Definitions
7.3 Embeddings
7.4 Node Similarity
7.4.1 Adjacency-based Similarity
7.4.2 Multi-hop Similarity
7.4.3 Overlap Similarity
7.4.4 Random Walk Embedding
7.4.5 Graph Neural Network Properties
7.5 Neighborhood Aggregation in Graph Neural Networks
7.5.1 Supervised Node Classification Using a GNN
7.6 Graph Neural Network Variants
7.6.1 Graph Convolution Network
7.6.2 GraphSAGE
7.6.3 Gated Graph Neural Networks
7.6.4 Graph Attention Networks
7.6.5 Message-Passing Networks
7.7 Applications
7.8 Software Libraries, Benchmarks, and Visualization
7.9 Summary
8 Transformers
8.1 Introduction
8.2 General-Purpose Transformer-Based Architectures
8.2.1 BERT
8.3 Self-Attention
8.4 Multi-head Attention
8.5 Transformer
8.5.1 Positional Encoding
8.5.2 Encoder
8.5.3 Decoder
8.5.4 Pre-training and Fine-tuning
8.6 Transformer Models
8.6.1 Autoencoding Transformers
8.6.2 Auto-regressive Transformers
8.6.3 Sequence-to-Sequence Transformers
8.6.4 GPT-3
8.7 Vision Transformers
8.8 Multi-modal Transformers
8.9 Text and Code Transformers
8.10 Summary
Part III Generative Models
9 Generative Adversarial Networks
9.1 Introduction
9.1.1 Progress
9.1.2 Game Theory
9.1.3 Co-evolution
9.2 Minimax Optimization
9.3 Divergence between Distributions
9.3.1 Least Squares GAN
9.3.2 f-GAN
9.4 Optimal Objective Value
9.5 Gradient Descent Ascent
9.6 Optimistic Gradient Descent Ascent
9.7 GAN Training
9.7.1 Discriminator Training
9.7.2 Generator Training
9.7.3 Alternating Discriminator–Generator Training
9.8 GAN Losses
9.8.1 Wasserstein GAN
9.8.2 Unrolled GAN
9.9 GAN Architectures
9.9.1 Progressive GAN
9.9.2 Deep Convolutional GAN
9.9.3 Semi-Supervised GAN
9.9.4 Conditional GAN
9.9.5 Image-to-Image Translation
9.9.6 Cycle-Consistent GAN
9.9.7 Registration GAN
9.9.8 Self-Attention GAN and BigGAN
9.9.9 Composition and Control with GANs
9.9.10 Instance Conditioned GAN
9.10 Evaluation
9.10.1 Inception Score
9.10.2 Frechet Inception Distance
9.11 Applications
9.11.1 Super Resolution and Restoration
9.11.2 Style Synthesis
9.11.3 Image Completion
9.11.4 De-raining
9.11.5 Map Synthesis
9.11.6 Pose Synthesis
9.11.7 Face Editing
9.11.8 Training Data Generation
9.11.9 Text-to-Image Synthesis
9.11.10 Medical Imaging
9.11.11 Video Synthesis
9.11.12 Motion Retargeting
9.11.13 3D Synthesis
9.11.14 Graph Synthesis
9.11.15 Autonomous Vehicles
9.11.16 Text-to-Speech Synthesis
9.11.17 Voice Conversion
9.11.18 Music Synthesis
9.11.19 Protein Design
9.11.20 Natural Language Synthesis
9.11.21 Cryptography
9.12 Software Libraries, Benchmarks, and Visualization
9.13 Summary
10 Variational Autoencoders
10.1 Introduction
10.2 Variational Inference
10.2.1 Reverse KL
10.2.2 Score Gradient
10.2.3 Reparameterization Gradient
10.2.4 Forward KL
10.3 Variational Autoencoder
10.3.1 Autoencoder
10.3.2 Variational Autoencoder
10.4 Generative Flows
10.5 Denoising Diffusion Probabilistic Model
10.5.1 Forward Noising Process
10.5.2 Reverse Generation by Sampling
10.6 Geometric Variational Inference
10.6.1 Moser Flow
10.6.2 Riemannian Score-Based Generative Models
10.7 Software Libraries
10.8 Summary
Part IV Reinforcement Learning
11 Reinforcement Learning
11.1 Introduction
11.2 Multi-Armed Bandit
11.2.1 Greedy Approach
11.2.2
ε-greedy Approach
11.2.3 Upper Confidence Bound
11.3 State Machines
11.4 Markov Processes
11.5 Markov Decision Processes
11.5.1 State of Environment and Agent
11.6 Definitions
11.6.1 Policy
11.6.2 State Action Diagram
11.6.3 State Value Function
11.6.4 Action Value Function
11.6.5 Reward
11.6.6 Model
11.6.7 Agent Types
11.6.8 Problem Types
11.6.9 Agent Representation of State
11.6.10 Bellman Expectation Equation for State Value Function
11.6.11 Bellman Expectation Equation for Action Value Function
11.7 Optimal Policy
11.7.1 Optimal Value Function
11.7.2 Bellman Optimality Equation for
V[sub(*)]
11.7.3 Bellman Optimality Equation for
Q[sub(*)]
11.8 Planning by Dynamic Programming with a Known MDP
11.8.1 Iterative Policy Evaluation
11.8.2 Policy Iteration
11.8.3 Infinite Horizon Value Iteration
11.9 Reinforcement Learning
11.9.1 Model-Based Reinforcement Learning
11.9.2 Policy Search
11.9.3 Monte Carlo Sampling
11.9.4 Temporal Difference Sampling
11.9.5 Q-Learning
11.9.6 Sarsa
11.9.7 On-Policy vs. Off-Policy Methods
11.9.8 Sarsa(λ)
11.10 Maximum Entropy Reinforcement Learning
11.11 Summary
12 Deep Reinforcement Learning
12.1 Introduction
12.2 Function Approximation
12.2.1 State Value Function Approximation
12.2.2 Action Value Function Approximation
12.3 Value-Based Methods
12.3.1 Experience Replay
12.3.2 Neural Fitted
Q-Iteration
12.3.3 Deep Q-Network
12.3.4 Target Network
12.3.5 Algorithm
12.3.6 Prioritized Replay
12.3.7 Double DQN
12.3.8 Dueling Networks
12.4 Policy-Based Methods
12.4.1 Policy Gradient
12.4.2 REINFORCE
12.4.3 Subtracting a Baseline
12.5 Actor–Critic Methods
12.5.1 Advantage Actor–Critic
12.5.2 Asynchronous Advantage Actor–Critic
12.5.3 Importance Sampling
12.5.4 Surrogate Loss
12.5.5 Natural Policy Gradient
12.5.6 Trust Region Policy Optimization
12.5.7 Proximal Policy Optimization
12.5.8 Deep Deterministic Policy Gradient
12.6 Model-Based Reinforcement Learning
12.6.1 Monte Carlo Tree Search
12.6.2 Expert Iteration and AlphaZero
12.6.3 World Models
12.7 Imitation Learning
12.8 Exploration
12.8.1 Sparse Rewards
12.9 Summary
Part V Applications
13 Applications
13.1 Introduction
13.2 Autonomous Vehicles
13.3 Climate Change and Climate Monitoring
13.3.1 Predicting Ocean Biogeochemistry
13.3.2 Predicting Atlantic Multidecadal Variability
13.3.3 Predicting Wildfire Growth
13.4 Computer Vision
13.4.1 Kinship Verification
13.4.2 Image-to-3D
13.4.3 Image2LEGO®
13.4.4 Imaging through Scattering Media
13.4.5 Contrastive Language-Image Pre-training
13.5 Speech and Audio Processing
13.5.1 Audio Reverb Impulse Response Synthesis
13.5.2 Voice Swapping
13.5.3 Explainable Musical Phrase Completion
13.6 Natural Language Processing
13.6.1 Quantifying and Alleviating Distribution Shifts in Foundation Models on Review Classification
13.7 Automated Machine Learning
13.8 Education
13.8.1 Learning-to-Learn STEM Courses
13.9 Proteomics
13.9.1 Protein Structure Prediction
13.9.2 Protein Docking
13.10 Combinatorial Optimization
13.10.1 Problems over Graphs
13.10.2 Learning Graph Algorithms as Single-Player Games
13.11 Physics
13.11.1 Pedestrian Wind Estimation in Urban Environments
13.11.2 Fusion Plasma
13.12 Summary
Appendix A: Matrix Calculus
A.1 Gradient Computations for Backpropagation
A.1.1 Scalar by Vector
A.1.2 Scalar by Matrix
A.1.3 Vector by Vector
A.1.4 Matrix by Scalar
A.2 Gradient Computations for Optimization
A.2.1 Dot Product by Vector
A.2.2 Quadratic Form by Vector
Appendix B: Scientific Writing and Reviewing Best Practices
B.1 Writing Best Practices
B.1.1 Introduction
B.1.2 Methods
B.1.3 Figures and Tables
B.1.4 Results
B.1.5 Abbreviations and Notation
B.2 Reviewing Best Practices
B.2.1 Ranking
B.2.2 Rebuttal
References
Index