Geometry of Deep Learning: A Signal Processing Perspective

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

The focus of this book is on providing students with insights into geometry that can help them understand deep learning from a unified perspective. Rather than describing deep learning as an implementation technique, as is usually the case in many existing deep learning books, here, deep learning is explained as an ultimate form of signal processing techniques that can be imagined. 

To support this claim, an overview of classical kernel machine learning approaches is presented, and their advantages and limitations are explained. Following a detailed explanation of the basic building blocks of deep neural networks from a biological and algorithmic point of view, the latest tools such as attention, normalization, Transformer, BERT, GPT-3, and others are described. Here, too, the focus is on the fact that in these heuristic approaches, there is an important, beautiful geometric structure behind the intuition that enables a systematic understanding. A unified geometric analysis to understand the working mechanism of deep learning from high-dimensional geometry is offered. Then, different forms of generative models like GAN, VAE, normalizing flows, optimal transport, and so on are described from a unified geometric perspective, showing that they actually come from statistical distance-minimization problems.

Because this book contains up-to-date information from both a practical and theoretical point of view, it can be used as an advanced deep learning textbook in universities or as a reference source for researchers interested in acquiring the latest deep learning algorithms and their underlying principles. In addition, the book has been prepared for a codeshare course for both engineering and mathematics students, thus much of the content is interdisciplinary and will appeal to students from both disciplines.


Author(s): Jong Chul Ye
Series: Mathematics in Industry, 37
Publisher: Springer
Year: 2022

Language: English
Pages: 346
City: Singapore

Preface
Contents
Part I Basic Tools for Machine Learning
1 Mathematical Preliminaries
1.1 Metric Space
1.2 Vector Space
1.3 Banach and Hilbert Space
1.3.1 Basis and Frames
1.4 Probability Space
1.5 Some Matrix Algebra
1.5.1 Kronecker Product
1.5.2 Matrix and Vector Calculus
1.6 Elements of Convex Optimization
1.6.1 Some Definitions
1.6.2 Convex Sets, Convex Functions
1.6.3 Subdifferentials
1.6.4 Convex Conjugate
1.6.5 Lagrangian Dual Formulation
1.7 Exercises
2 Linear and Kernel Classifiers
2.1 Introduction
2.2 Hard-Margin Linear Classifier
2.2.1 Maximum Margin Classifier for Separable Cases
2.2.2 Dual Formulation
2.2.3 KKT Conditions and Support Vectors
2.3 Soft-Margin Linear Classifiers
2.3.1 Maximum Margin Classifier with Noise
2.4 Nonlinear Classifier Using Kernel SVM
2.4.1 Linear Classifier in the Feature Space
2.4.2 Kernel Trick
2.5 Classical Approaches for Image Classification
2.6 Exercises
3 Linear, Logistic, and Kernel Regression
3.1 Introduction
3.2 Linear Regression
3.2.1 Ordinary Least Squares (OLS)
3.3 Logistic Regression
3.3.1 Logits and Linear Regression
3.3.2 Multiclass Classification Using Logistic Regression
3.4 Ridge Regression
3.5 Kernel Regression
3.6 Bias–Variance Trade-off in Regression
3.6.1 Examples
3.7 Exercises
4 Reproducing Kernel Hilbert Space, Representer Theorem
4.1 Introduction
4.2 Reproducing Kernel Hilbert Space (RKHS)
4.2.1 Feature Map and Kernels
4.2.2 Definition of RKHS
4.3 Representer Theorem
4.4 Application of Representer Theorem
4.4.1 Kernel Ridge Regression
4.4.2 Kernel SVM
4.5 Pros and Cons of Kernel Machines
4.6 Exercises
Part II Building Blocks of Deep Learning
5 Biological Neural Networks
5.1 Introduction
5.2 Neurons
5.2.1 Anatomy of Neurons
5.2.2 Signal Transmission Mechanism
5.2.3 Synaptic Plasticity
5.3 Biological Neural Network
5.3.1 Visual System
5.3.2 Hubel and Wiesel Model
5.3.3 Jennifer Aniston Cell
5.4 Exercises
6 Artificial Neural Networks and Backpropagation
6.1 Introduction
6.2 Artificial Neural Networks
6.2.1 Notation
6.2.2 Modeling a Single Neuron
6.2.3 Feedforward Multilayer ANN
6.3 Artificial Neural Network Training
6.3.1 Problem Formulation
6.3.2 Optimizers
6.3.2.1 Gradient Descent
6.3.2.2 Stochastic Gradient Descent (SGD) Method
6.3.2.3 Momentum Method
6.3.2.4 Other Variations
6.4 The Backpropagation Algorithm
6.4.1 Derivation of the Backpropagation Algorithm
6.4.2 Geometrical Interpretation of BP Algorithm
6.4.3 Variational Interpretation of BP Algorithm
6.4.4 Local Variational Formulation
6.5 Exercises
7 Convolutional Neural Networks
7.1 Introduction
7.2 History of Modern CNNs
7.2.1 AlexNet
7.2.2 GoogLeNet
7.2.3 VGGNet
7.2.4 ResNet
7.2.5 DenseNet
7.2.6 U-Net
7.3 Basic Building Blocks of CNNs
7.3.1 Convolution
7.3.2 Pooling and Unpooling
7.3.3 Skip Connection
7.4 Training CNNs
7.4.1 Loss Functions
7.4.2 Data Split
7.4.3 Regularization
7.4.3.1 Data Augmentation
7.4.3.2 Parameter Regularization
7.4.3.3 Dropout
7.5 Visualizing CNNs
7.6 Applications of CNNs
7.7 Exercises
8 Graph Neural Networks
8.1 Introduction
8.2 Mathematical Preliminaries
8.2.1 Definition
8.2.2 Graph Isomorphism
8.2.3 Graph Coloring
8.3 Related Works
8.3.1 Word Embedding
8.3.1.1 CBOW
8.3.1.2 Skip-Gram
8.3.2 Loss Function
8.4 Graph Embedding
8.4.1 Matrix Factorization Approaches
8.4.2 Random Walks Approaches
8.4.2.1 DeepWalks
8.4.2.2 Node2vec
8.4.3 Neural Network Approaches
8.5 WL Test, Graph Neural Networks
8.5.1 Weisfeiler–Lehman Isomorphism Test
8.5.2 Graph Neural Network as WL Test
8.6 Summary and Outlook
8.7 Exercises
9 Normalization and Attention
9.1 Introduction
9.1.1 Notation
9.2 Normalization
9.2.1 Batch Normalization
9.2.2 Layer and Instance Normalization
9.2.3 Adaptive Instance Normalization (AdaIN)
9.2.4 Whitening and Coloring Transform (WCT)
9.3 Attention
9.3.1 Metabotropic Receptors: Biological Analogy
9.3.2 Mathematical Modeling of Spatial Attention
9.3.3 Channel Attention
9.4 Applications
9.4.1 StyleGAN
9.4.2 Self-Attention GAN
9.4.3 Attentional GAN: Text to Image Generation
9.4.4 Graph Attention Network
9.4.5 Transformer
9.4.6 BERT
9.4.7 Generative Pre-trained Transformer (GPT)
9.4.8 Vision Transformer
9.5 Mathematical Analysis of Normalization and Attention
9.6 Exercises
Part III Advanced Topics in Deep Learning
10 Geometry of Deep Neural Networks
10.1 Introduction
10.1.1 Desiderata of Machine Learning
10.2 Case Studies
10.2.1 Single–Layer Perceptron
10.2.2 Frame Representation
10.3 Convolution Framelets
10.3.1 Convolution and Hankel Matrix
10.3.2 Convolution Framelet Expansion
10.3.3 Link to CNN
10.3.4 Deep Convolutional Framelets
10.4 Geometry of CNN
10.4.1 Role of Nonlinearity
10.4.2 Nonlinearity Is the Key for Inductive Learning
10.4.3 Expressivity
10.4.4 Geometric Meaning of Features
10.4.5 Geometric Understanding of Autoencoder
10.4.6 Geometric Understanding of Classifier
10.5 Open Problems
10.6 Exercises
11 Deep Learning Optimization
11.1 Introduction
11.2 Problem Formulation
11.3 Polyak–Łojasiewicz-Type Convergence Analysis
11.3.1 Loss Landscape and Over-Parameterization
11.4 Lyapunov-Type Convergence Analysis
11.4.1 The Neural Tangent Kernel (NTK)
11.4.2 NTK at Infinite Width Limit
11.4.3 NTK for General Loss Function
11.5 Exercises
12 Generalization Capability of Deep Learning
12.1 Introduction
12.2 Mathematical Preliminaries
12.2.1 Vapnik–Chervonenkis (VC) Bounds
12.2.2 Rademacher Complexity Bounds
12.2.3 PAC–Bayes Bounds
12.3 Reconciling the Generalization Gap via Double Descent Model
12.4 Inductive Bias of Optimization
12.5 Generalization Bounds via Algorithm Robustness
12.6 Exercises
13 Generative Models and Unsupervised Learning
13.1 Introduction
13.2 Mathematical Preliminaries
13.3 Statistical Distances
13.3.1 f-Divergence
13.3.1.1 Kullback–Leibler (KL) Divergence
13.3.1.2 Jensen–Shannon (JS) Divergence
13.3.2 Wasserstein Metric
13.4 Optimal Transport
13.4.1 Monge's Original Formulation
13.4.2 Kantorovich Formulation
13.4.3 Entropy Regularization
13.5 Generative Adversarial Networks
13.5.1 Earliest Form of GAN
13.5.2 f-GAN
13.5.3 Wasserstein GAN (W-GAN)
13.5.4 StyleGAN
13.6 Autoencoder-Type Generative Models
13.6.1 ELBO
13.6.2 Variational Autoencoder (VAE)
13.6.3 β-VAE
13.6.4 Normalizing Flow, Invertible Flow
13.7 Unsupervised Learning via Image Translation
13.7.1 Pix2pix
13.7.2 CycleGAN
13.7.3 StarGAN
13.7.4 Collaborative GAN
13.8 Summary and Outlook
13.9 Exercises
14 Summary and Outlook
15 Bibliography
Index