Journey through the theory and practice of modern deep learning, and apply innovative techniques to solve everyday data problems.
In Inside Deep Learning, you will learn how to:
• Implement deep learning with PyTorch
• Select the right deep learning components
• Train and evaluate a deep learning model
• Fine tune deep learning models to maximize performance
• Understand deep learning terminology
• Adapt existing PyTorch code to solve new problems
Inside Deep Learning is an accessible guide to implementing deep learning with the PyTorch framework. It demystifies complex deep learning concepts and teaches you to understand the vocabulary of deep learning so you can keep pace in a rapidly evolving field. No detail is skipped—you’ll dive into math, theory, and practical applications. Everything is clearly explained in plain English.
About the technology
Deep learning doesn’t have to be a black box! Knowing how your models and algorithms actually work gives you greater control over your results. And you don’t have to be a mathematics expert or a senior data scientist to grasp what’s going on inside a deep learning system. This book gives you the practical insight you need to understand and explain your work with confidence.
About the book
Inside Deep Learning illuminates the inner workings of deep learning algorithms in a way that even machine learning novices can understand. You’ll explore deep learning concepts and tools through plain language explanations, annotated code, and dozens of instantly useful PyTorch examples. Each type of neural network is clearly presented without complex math, and every solution in this book can run using readily available GPU hardware!
What's inside
• Select the right deep learning components
• Train and evaluate a deep learning model
• Fine tune deep learning models to maximize performance
• Understand deep learning terminology
About the reader
For Python programmers with basic machine learning skills.
About the author
Edward Raff is a Chief Scientist at Booz Allen Hamilton, and the author of the JSAT machine learning library.
Author(s): Edward Raff
Edition: 1
Publisher: Manning Publications
Year: 2022
Language: Russian
Commentary: Vector PDF
Pages: 600
City: Shelter Island, NY
Tags: Machine Learning; Deep Learning; Python; Convolutional Neural Networks; Recurrent Neural Networks; Autoencoders; Generative Adversarial Networks; Transfer Learning; Sequence-to-sequence Models; Object Detection; Attention Mechanisms
brief contents
contents
foreword
preface
acknowledgments
about this book
Who should read this book?
How this book is organized: A road map
About the mathematical notations
About the exercises
About Google Colab
About the code
liveBook discussion forum
Other online resources
about the author
about the cover illustration
Part 1 Foundational methods
1 The mechanics of learning
1.1 Getting started with Colab
1.2 The world as tensors
1.2.1 PyTorch GPU acceleration
1.3 Automatic differentiation
1.3.1 Using derivatives to minimize losses
1.3.2 Calculating a derivative with automatic differentiation
1.3.3 Putting it together: Minimizing a function with derivatives
1.4 Optimizing parameters
1.5 Loading dataset objects
1.5.1 Creating a training and testing split
Exercises
Summary
2 Fully connected networks
2.1 Neural networks as optimization
2.1.1 Notation of training a neural network
2.1.2 Building a linear regression model
2.1.3 The training loop
2.1.4 Defining a dataset
2.1.5 Defining the model
2.1.6 Defining the loss function
2.1.7 Putting it together: Training a linear regression model on the data
2.2 Building our first neural network
2.2.1 Notation for a fully connected network
2.2.2 A fully connected network in PyTorch
2.2.3 Adding nonlinearities
2.3 Classification problems
2.3.1 Classification toy problem
2.3.2 Classification loss function
2.3.3 Training a classification network
2.4 Better training code
2.4.1 Custom metrics
2.4.2 Training and testing passes
2.4.3 Saving checkpoints
2.4.4 Putting it all together: A better model training function
2.5 Training in batches
Exercises
Summary
3 Convolutional neural networks
3.1 Spatial structural prior beliefs
3.1.1 Loading MNIST with PyTorch
3.2 What are convolutions?
3.2.1 1D convolutions
3.2.2 2D convolutions
3.2.3 Padding
3.2.4 Weight sharing
3.3 How convolutions benefit image processing
3.4 Putting it into practice: Our first CNN
3.4.1 Making a convolutional layer with multiple filters
3.4.2 Using multiple filters per layer
3.4.3 Mixing convolutional layers with linear layers via flattening
3.4.4 PyTorch code for our first CNN
3.5 Adding pooling to mitigate object movement
3.5.1 CNNs with max pooling
3.6 Data augmentation
Exercises
Summary
4 Recurrent neural networks
4.1 Recurrent neural networks as weight sharing
4.1.1 Weight sharing for a fully connected network
4.1.2 Weight sharing over time
4.2 RNNs in PyTorch
4.2.1 A simple sequence classification problem
4.2.2 Embedding layers
4.2.3 Making predictions using the last time step
4.3 Improving training time with packing
4.3.1 Pad and pack
4.3.2 Packable embedding layer
4.3.3 Training a batched RNN
4.3.4 Simultaneous packed and unpacked inputs
4.4 More complex RNNs
4.4.1 Multiple layers
4.4.2 Bidirectional RNNs
Exercises
Summary
5 Modern training techniques
5.1 Gradient descent in two parts
5.1.1 Adding a learning rate schedule
5.1.2 Adding an optimizer
5.1.3 Implementing optimizers and schedulers
5.2 Learning rate schedules
5.2.1 Exponential decay: Smoothing erratic training
5.2.2 Step drop adjustment: Better smoothing
5.2.3 Cosine annealing: Greater accuracy but less stability
5.2.4 Validation plateau: Data-based adjustments
5.2.5 Comparing the schedules
5.3 Making better use of gradients
5.3.1 SGD with momentum: Adapting to gradient consistency
5.3.2 Adam: Adding variance to momentum
5.3.3 Gradient clipping: Avoiding exploding gradients
5.4 Hyperparameter optimization with Optuna
5.4.1 Optuna
5.4.2 Optuna with PyTorch
5.4.3 Pruning trials with Optuna
Exercises
Summary
6 Common design building blocks
6.1 Better activation functions
6.1.1 Vanishing gradients
6.1.2 Rectified linear units (ReLUs): Avoiding vanishing gradients
6.1.3 Training with LeakyReLU activations
6.2 Normalization layers: Magically better convergence
6.2.1 Where do normalization layers go?
6.2.2 Batch normalization
6.2.3 Training with batch normalization
6.2.4 Layer normalization
6.2.5 Training with layer normalization
6.2.6 Which normalization layer to use?
6.2.7 A peculiarity of layer normalization
6.3 Skip connections: A network design pattern
6.3.1 Implementing fully connected skips
6.3.2 Implementing convolutional skips
6.4 1 1 Convolutions: Sharing and reshaping information in channels
6.4.1 Training with 1 1 convolutions
6.5 Residual connections
6.5.1 Residual blocks
6.5.2 Implementing residual blocks
6.5.3 Residual bottlenecks
6.5.4 Implementing residual bottlenecks
6.6 Long short-term memory RNNs
6.6.1 RNNs: A fast review
6.6.2 LSTMs and the gating mechanism
6.6.3 Training an LSTM
Exercises
Summary
Part 2 Building advanced networks
7 Autoencoding and self-supervision
7.1 How autoencoding works
7.1.1 Principle component analysis is a bottleneck autoencoder
7.1.2 Implementing PCA
7.1.3 Implementing PCA with PyTorch
7.1.4 Visualizing PCA results
7.1.5 A simple nonlinear PCA
7.2 Designing autoencoding neural networks
7.2.1 Implementing an autoencoder
7.2.2 Visualizing autoencoder results
7.3 Bigger autoencoders
7.3.1 Robustness to noise
7.4 Denoising autoencoders
7.4.1 Denoising with Gaussian noise
7.5 Autoregressive models for time series and sequences
7.5.1 Implementing the char-RNN autoregressive text model
7.5.2 Autoregressive models are generative models
7.5.3 Changing samples with temperature
7.5.4 Faster sampling
Exercises
Summary
8 Object detection
8.1 Image segmentation
8.1.1 Nuclei detection: Loading the data
8.1.2 Representing the image segmentation problem in PyTorch
8.1.3 Building our first image segmentation network
8.2 Transposed convolutions for expanding image size
8.2.1 Implementing a network with transposed convolutions
8.3 U-Net: Looking at fine and coarse details
8.3.1 Implementing U-Net
8.4 Object detection with bounding boxes
8.4.1 Faster R-CNN
8.4.2 Using Faster R-CNN in PyTorch
8.4.3 Suppressing overlapping boxes
8.5 Using the pretrained Faster R-CNN
Exercises
Summary
9 Generative adversarial networks
9.1 Understanding generative adversarial networks
9.1.1 The loss computations
9.1.2 The GAN games
9.1.3 Implementing our first GAN
9.2 Mode collapse
9.3 Wasserstein GAN: Mitigating mode collapse
9.3.1 WGAN discriminator loss
9.3.2 WGAN generator loss
9.3.3 Implementing WGAN
9.4 Convolutional GAN
9.4.1 Designing a convolutional generator
9.4.2 Designing a convolutional discriminator
9.5 Conditional GAN
9.5.1 Implementing a conditional GAN
9.5.2 Training a conditional GAN
9.5.3 Controlling the generation with conditional GANs
9.6 Walking the latent space of GANs
9.6.1 Getting models from the Hub
9.6.2 Interpolating GAN output
9.6.3 Labeling latent dimensions
9.7 Ethics in deep learning
Exercises
Summary
10 Attention mechanisms
10.1 Attention mechanisms learn relative input importance
10.1.1 Training our baseline model
10.1.2 Attention mechanism mechanics
10.1.3 Implementing a simple attention mechanism
10.2 Adding some context
10.2.1 Dot score
10.2.2 General score
10.2.3 Additive attention
10.2.4 Computing attention weights
10.3 Putting it all together: A complete attention mechanism with context
Exercises
Summary
11 Sequence-to-sequence
11.1 Sequence-to-sequence as a kind of denoising autoencoder
11.1.1 Adding attention creates Seq2Seq
11.2 Machine translation and the data loader
11.2.1 Loading a small English-French dataset
11.3 Inputs to Seq2Seq
11.3.1 Autoregressive approach
11.3.2 Teacher-forcing approach
11.3.3 Teacher forcing vs. an autoregressive approach
11.4 Seq2Seq with attention
11.4.1 Implementing Seq2Seq
11.4.2 Training and evaluation
Exercises
Summary
12 Network design alternatives to RNNs
12.1 TorchText: Tools for text problems
12.1.1 Installing TorchText
12.1.2 Loading datasets in TorchText
12.1.3 Defining a baseline model
12.2 Averaging embeddings over time
12.2.1 Weighted average over time with attention
12.3 Pooling over time and 1D CNNs
12.4 Positional embeddings add sequence information to any model
12.4.1 Implementing a positional encoding module
12.4.2 Defining positional encoding models
12.5 Transformers: Big models for big data
12.5.1 Multiheaded attention
12.5.2 Transformer blocks
Exercises
Summary
13 Transfer learning
13.1 Transferring model parameters
13.1.1 Preparing an image dataset
13.2 Transfer learning and training with CNNs
13.2.1 Adjusting pretrained networks
13.2.2 Preprocessing for pretrained ResNet
13.2.3 Training with warm starts
13.2.4 Training with frozen weights
13.3 Learning with fewer labels
13.4 Pretraining with text
13.4.1 Transformers with the Hugging Face library
13.4.2 Freezing weights with no-grad
Exercises
Summary
14 Advanced building blocks
14.1 Problems with pooling
14.1.1 Aliasing compromises translation invariance
14.1.2 Anti-aliasing by blurring
14.1.3 Applying anti-aliased pooling
14.2 Improved residual blocks
14.2.1 Effective depth
14.2.2 Implementing ReZero
14.3 MixUp training reduces overfitting
14.3.1 Picking the mix rate
14.3.2 Implementing MixUp
Exercises
Summary
A Setting up Colab
A.1 Creating a Colab session
Adding a GPU
Testing your GPU
index
Symbols & Special Characters
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z