Regularization in Deep Learning (MEAP v6)

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Make your Deep Learning models more generalized and adaptable! These practical regularization techniques improve training efficiency and help avoid overfitting errors. Regularization in Deep Learning teaches you how to improve your model performance with a toolbox of regularization techniques. It covers both well-established regularization methods and groundbreaking modern approaches. Each technique is introduced using graphics, illustrations, and step-by-step coding walkthroughs that make complex math easy to follow. One of the most important goals in building machine learning and especially Deep Learning models is to achieve good generalization performance in the test dataset. The training task is considered to be completed when we have obtained a generalizable model, often with the help of proper regularization in the training process. While the theory of generalization still remains a mystery, it is an active research area with new insights being proposed. Currently, there are quite a number of regularization techniques that have proved to be empirically effective in a specific training context. However, these resources are often scrambled and disconnected. This book intends to bridge the gap by offering a systematic and well-illustrated perspective on different regularization techniques, covering data, model, cost function, and optimization procedure. It even goes one step further by mixing the most recent research breakthroughs with practical coding examples on regularization in Deep Learning models. This book entertains this complex and ever-growing topic in a unique way. It introduces minimal mathematics and technical concepts in a well-illustrated manner and provides practical examples and code walkthroughs offered via a step-by-step fashion. The teaching is designed to be intuitive, natural, and progressive, instead of forcing in a particular concept. You’ll learn how to augment your dataset with random noise, improve your model’s architecture, and apply regularization in your optimization procedures. You’ll soon be building focused deep learning models that avoid sprawling complexity and deliver more accurate results even with new or messy data sets. about the reader For data scientists, Machine Learning engineers, and researchers with basic model development experience.

Author(s): Peng Liu
Publisher: Manning Publications
Year: 2023

Language: English
Pages: 288

Regularization in Deep Learning MEAP V06
Copyright
Welcome letter
Brief contents
Chapter 1: Introducing Regularization
1.1 Why do we need regularization?
1.2 Curse of dimensionality
1.3 Understanding underfitting and overfitting
1.4 Understanding bias-variance trade-off
1.5 More on the model training path
1.6 Understanding the model training process
1.7 The many faces of regularization
1.8 Summary
Chapter 2: Generalization: A Classical View
2.1 The data
2.1.1 Sampling from the underlying data distribution
2.1.2 The train-test split
2.2 The model
2.2.1 The prediction function
2.2.2 The bias trick
2.2.3 Implementing the prediction function
2.3 The cost function
2.3.1 Expressing the cost function with linear algebra
2.4 The optimization algorithm
2.4.1 The multiple minima
2.4.2 The closed-form solution of linear regression
2.4.3 The gradient descent algorithm
2.4.4 Different types of gradient descent
2.4.5 The stochastic gradient descent algorithm
2.4.6 The impact of the learning rate
2.5 Improving the predictive performance
2.5.1 Augmented representation via feature engineering
2.5.2 Quadratic basis function
2.6 Empirical risk minimization
2.6.1 More on the model
2.6.2 Bias and variance decomposition
2.6.3 Understanding bias and variance using bootstrap
2.6.4 Reduced generalization with high model complexity
2.7 Summary
Chapter 3: Generalization: A Modern View
3.1 A modern view on generalization
3.1.1 Beyond perfect interpolation
3.1.2 Behind the “double descent” phenomenon
3.1.3 Extending the “double descent” phenomenon
3.2 Double Descent in Polynomial Regression
3.2.1 Smoothing spline
3.2.2 Rewriting the smoothing spline cost function
3.2.3 Deriving the closed-form solution
3.2.4 Implementing the smoothing spline model
3.2.5 Sample non-monotonicity
3.3 Summary
Chapter 4: Fundamentals of Training Deep Neural Networks
4.1 Multilayer perceptron
4.1.1 A two-layer neural network
4.1.2 Shallow versus deep neural network
4.2 Automatic differentiation
4.2.1 Gradient-based optimization
4.2.2 The chain rule with partial derivatives
4.2.3 Different modes of multiplication
4.3 Training a simple CNN using MNIST
4.3.1 Download and loading MNIST
4.3.2 Defining the prediction function
4.3.3 Define the cost function
4.3.4 Define the optimization procedure
4.3.5 Update the weights via iterative training
4.4 More on generalization
4.4.1 Multiple global minima
4.4.2 Best versus worst global minimum
4.5 Summary
Chapter 5: Regularization via Data
5.1 Data-based methods
5.1.1 Data augmentation
5.1.2 Label smoothing
5.2 Training deep neural networks using data augmentation
5.2.1 Training without data augmentation
LeNet
5.2.2 Training with data augmentation
5.3 The deep bootstrap framework
5.3.1 Insufficiency of classical generalization framework
5.3.2 Online optimization
5.3.3 Connecting online optimization with offline generalization
5.3.4 Constructing the ideal world with CIFAR-5m
5.3.5 Model training in the ideal world
5.3.6 Model testing
5.3.7 Bootstrap error between real world and ideal world
5.3.8 Implicit bias in convolutional neural networks
5.4 Summary
Chapter 6: Regularization via Model
6.1 Inductive bias in convolutional neural networks
6.1.1 Revisiting the fully-connected network
6.1.2 Translational invariance in convolutional neural networks
6.1.3 Understanding the convolution operator
6.1.4 Weight sharing in the convolution operation
6.2 Regularizing deep neural networks via dropout
6.2.1 Introducing dropout
6.2.2 Inducing a sparse representation
6.2.3 Dropout in action
6.2.4 Applying dropout in CNN
6.3 Implicit regularization in multi-task learning
6.3.1 Two MTL approaches in deep neural networks
6.3.2 Modifying the loss function to achieve soft parameter sharing
6.3.3 MTL in action
6.4 Summary
Chapter 7: Regularization via Objective Function
7.1 Introducing the regularization term
7.1.1 The unregularized linear regression
7.1.2 Norm-based penalty
7.2 L2 regularization in ridge regression
7.2.1 Using the analytic solution
7.2.2 Using gradient descent algorithm
7.2.3 Handling the bias term
7.2.4 L2 regularization in action
7.3 Sparse estimation via LASSO
7.3.1 Geometric interpretation of ridge regression
7.3.2 Introducing the L0 norm
7.3.3 Introducing the L1 norm
7.3.4 Understanding LASSO
7.3.5 The soft-thresholding rule
7.3.6 LASSO in action
7.4 Summary
Chapter 8: Regularization via Optimization
8.1 Stochastic optimization
8.1.1 Empirical risk minimization via gradient descent
8.1.2 Convergence of SGD
8.1.3 Implicit regularization of SGD
8.1.4 Analyzing the mean iterate
8.1.5 SGD variants: better or worse?
8.2 More on SGD convergence
8.2.1 SGD in univariate linear regression
8.2.2 SGD’s convergence in expectation
8.2.3 SGD: past, present, and future
8.3 Summary
08.pdf
Chapter 8: Regularization via Optimization
8.1 Stochastic optimization
8.1.1 Empirical risk minimization via gradient descent
8.1.2 Convergence of SGD
8.1.3 Implicit regularization of SGD
8.1.4 Analyzing the mean iterate
8.1.5 SGD variants: better or worse?
8.2 More on SGD convergence
8.2.1 SGD in univariate linear regression
8.2.2 SGD’s convergence in expectation
8.2.3 SGD: past, present, and future
8.3 Summary