Computer vision is central to many leading-edge innovations, including self-driving cars, drones, augmented reality, facial recognition, and much, much more. Amazing new computer vision applications are developed every day, thanks to rapid advances in AI and deep learning (DL). Deep Learning for Vision Systems teaches you the concepts and tools for building intelligent, scalable computer vision systems that can identify and react to objects in images, videos, and real life. With author Mohamed Elgendy's expert instruction and illustration of real-world projects, you’ll finally grok state-of-the-art deep learning techniques, so you can build, contribute to, and lead in the exciting realm of computer vision!
About the technology
How much has computer vision advanced? One ride in a Tesla is the only answer you’ll need. Deep learning techniques have led to exciting breakthroughs in facial recognition, interactive simulations, and medical imaging, but nothing beats seeing a car respond to real-world stimuli while speeding down the highway.
About the book
How does the computer learn to understand what it sees? Deep Learning for Vision Systems answers that by applying deep learning to computer vision. Using only high school algebra, this book illuminates the concepts behind visual intuition. You'll understand how to use deep learning architectures to build vision system applications for image generation and facial recognition.
Author(s): Mohamed Elgendy
Edition: 1
Publisher: Manning Publications
Year: 2020
Language: English
Commentary: Vector PDF
Pages: 475
City: Shelter Island, NY
Tags: Deep Learning;Computer Vision;Image Processing;Recommender Systems;Convolutional Neural Networks;Generative Adversarial Networks;Face Recognition;Classification;Transfer Learning;Feature Engineering;Pipelines;Gradient Descent;Regularization;Hyperparameter Tuning;Optimization;Perception;Perceptron;Image Classification;Overfitting;Inception Networks;Activation Functions;AlexNet;LeNet;GoogLeNet;ResNet;VGGNet;Object Detection;Backpropagation;Datasets;Generative Art;Feedforward Neural Networks
Deep Learning for Vision Systems
contents
preface
acknowledgments
about this book
Who should read this book
How this book is organized: A roadmap
About the code
liveBook discussion forum
about the author
about the cover illustration
Part 1—Deep learning foundation
1 Welcome to computer vision
1.1 Computer vision
1.1.1 What is visual perception?
1.1.2 Vision systems
1.1.3 Sensing devices
1.1.4 Interpreting devices
1.2 Applications of computer vision
1.2.1 Image classification
1.2.2 Object detection and localization
1.2.3 Generating art (style transfer)
1.2.4 Creating images
1.2.5 Face recognition
1.2.6 Image recommendation system
1.3 Computer vision pipeline: The big picture
1.4 Image input
1.4.1 Image as functions
1.4.2 How computers see images
1.4.3 Color images
1.5 Image preprocessing
1.5.1 Converting color images to grayscale to reduce computation complexity
1.6 Feature extraction
1.6.1 What is a feature in computer vision?
1.6.2 What makes a good (useful) feature?
1.6.3 Extracting features (handcrafted vs. automatic extracting)
1.7 Classifier learning algorithm
Summary
2 Deep learning and neural networks
2.1 Understanding perceptrons
2.1.1 What is a perceptron?
2.1.2 How does the perceptron learn?
2.1.3 Is one neuron enough to solve complex problems?
2.2 Multilayer perceptrons
2.2.1 Multilayer perceptron architecture
2.2.2 What are hidden layers?
2.2.3 How many layers, and how many nodes in each layer?
2.2.4 Some takeaways from this section
2.3 Activation functions
2.3.1 Linear transfer function
2.3.2 Heaviside step function (binary classifier)
2.3.3 Sigmoid/logistic function
2.3.4 Softmax function
2.3.5 Hyperbolic tangent function (tanh)
2.3.6 Rectified linear unit
2.3.7 Leaky ReLU
2.4 The feedforward process
2.4.1 Feedforward calculations
2.4.2 Feature learning
2.5 Error functions
2.5.1 What is the error function?
2.5.2 Why do we need an error function?
2.5.3 Error is always positive
2.5.4 Mean square error
2.5.5 Cross-entropy
2.5.6 A final note on errors and weights
2.6 Optimization algorithms
2.6.1 What is optimization?
2.6.2 Batch gradient descent
2.6.3 Stochastic gradient descent
2.6.4 Mini-batch gradient descent
2.6.5 Gradient descent takeaways
2.7 Backpropagation
2.7.1 What is backpropagation?
2.7.2 Backpropagation takeaways
Summary
3 Convolutional neural networks
3.1 Image classification using MLP
3.1.1 Input layer
3.1.2 Hidden layers
3.1.3 Output layer
3.1.4 Putting it all together
3.1.5 Drawbacks of MLPs for processing images
3.2 CNN architecture
3.2.1 The big picture
3.2.2 A closer look at feature extraction
3.2.3 A closer look at classification
3.3 Basic components of a CNN
3.3.1 Convolutional layers
3.3.2 Pooling layers or subsampling
3.3.3 Fully connected layers
3.4 Image classification using CNNs
3.4.1 Building the model architecture
3.4.2 Number of parameters (weights)
3.5 Adding dropout layers to avoid overfitting
3.5.1 What is overfitting?
3.5.2 What is a dropout layer?
3.5.3 Why do we need dropout layers?
3.5.4 Where does the dropout layer go in the CNN architecture?
3.6 Convolution over color images (3D images)
3.6.1 How do we perform a convolution on a color image?
3.6.2 What happens to the computational complexity?
3.7 Project: Image classification for color images
Summary
4 Structuring DL projects and hyperparameter tuning
4.1 Defining performance metrics
4.1.1 Is accuracy the best metric for evaluating a model?
4.1.2 Confusion matrix
4.1.3 Precision and recall
4.1.4 F-score
4.2 Designing a baseline model
4.3 Getting your data ready for training
4.3.1 Splitting your data for train/validation/test
4.3.2 Data preprocessing
4.4 Evaluating the model and interpreting its performance
4.4.1 Diagnosing overfitting and underfitting
4.4.2 Plotting the learning curves
4.4.3 Exercise: Building, training, and evaluating a network
4.5 Improving the network and tuning hyperparameters
4.5.1 Collecting more data vs. tuning hyperparameters
4.5.2 Parameters vs. hyperparameters
4.5.3 Neural network hyperparameters
4.5.4 Network architecture
4.6 Learning and optimization
4.6.1 Learning rate and decay schedule
4.6.2 A systematic approach to find the optimal learning rate
4.6.3 Learning rate decay and adaptive learning
4.6.4 Mini-batch size
4.7 Optimization algorithms
4.7.1 Gradient descent with momentum
4.7.2 Adam
4.7.3 Number of epochs and early stopping criteria
4.7.4 Early stopping
4.8 Regularization techniques to avoid overfitting
4.8.1 L2 regularization
4.8.2 Dropout layers
4.8.3 Data augmentation
4.9 Batch normalization
4.9.1 The covariate shift problem
4.9.2 Covariate shift in neural networks
4.9.3 How does batch normalization work?
4.9.4 Batch normalization implementation in Keras
4.9.5 Batch normalization recap
4.10 Project: Achieve high accuracy on image classification
Summary
Part 2—Image classification and detection
5 Advanced CNN architectures
5.1 CNN design patterns
5.2 LeNet-5
5.2.1 LeNet architecture
5.2.2 LeNet-5 implementation in Keras
5.2.3 Setting up the learning hyperparameters
5.2.4 LeNet performance on the MNIST dataset
5.3 AlexNet
5.3.1 AlexNet architecture
5.3.2 Novel features of AlexNet
5.3.3 AlexNet implementation in Keras
5.3.4 Setting up the learning hyperparameters
5.3.5 AlexNet performance
5.4 VGGNet
5.4.1 Novel features of VGGNet
5.4.2 VGGNet configurations
5.4.3 Learning hyperparameters
5.4.4 VGGNet performance
5.5 Inception and GoogLeNet
5.5.1 Novel features of Inception
5.5.2 Inception module: Naive version
5.5.3 Inception module with dimensionality reduction
5.5.4 Inception architecture
5.5.5 GoogLeNet in Keras
5.5.6 Learning hyperparameters
5.5.7 Inception performance on the CIFAR dataset
5.6 ResNet
5.6.1 Novel features of ResNet
5.6.2 Residual blocks
5.6.3 ResNet implementation in Keras
5.6.4 Learning hyperparameters
5.6.5 ResNet performance on the CIFAR dataset
Summary
6 Transfer learning
6.1 What problems does transfer learning solve?
6.2 What is transfer learning?
6.3 How transfer learning works
6.3.1 How do neural networks learn features?
6.3.2 Transferability of features extracted at later layers
6.4 Transfer learning approaches
6.4.1 Using a pretrained network as a classifier
6.4.2 Using a pretrained network as a feature extractor
6.4.3 Fine-tuning
6.5 Choosing the appropriate level of transfer learning
6.5.1 Scenario 1: Target dataset is small and similar to the source dataset
6.5.2 Scenario 2: Target dataset is large and similar to the source dataset
6.5.3 Scenario 3: Target dataset is small and different from the source dataset
6.5.4 Scenario 4: Target dataset is large and different from the source dataset
6.5.5 Recap of the transfer learning scenarios
6.6 Open source datasets
6.6.1 MNIST
6.6.2 Fashion-MNIST
6.6.3 CIFAR
6.6.4 ImageNet
6.6.5 MS COCO
6.6.6 Google Open Images
6.6.7 Kaggle
6.7 Project 1: A pretrained network as a feature extractor
6.8 Project 2: Fine-tuning
Summary
7 Object detection with R-CNN, SSD, and YOLO
7.1 General object detection framework
7.1.1 Region proposals
7.1.2 Network predictions
7.1.3 Non-maximum suppression (NMS)
7.1.4 Object-detector evaluation metrics
7.2 Region-based convolutional neural networks (R-CNNs)
7.2.1 R-CNN
7.2.2 Fast R-CNN
7.2.3 Faster R-CNN
7.2.4 Recap of the R-CNN family
7.3 Single-shot detector (SSD)
7.3.1 High-level SSD architecture
7.3.2 Base network
7.3.3 Multi-scale feature layers
7.3.4 Non-maximum suppression
7.4 You only look once (YOLO)
7.4.1 How YOLOv3 works
7.4.2 YOLOv3 architecture
7.5 Project: Train an SSD network in a self-driving car application
7.5.1 Step 1: Build the model
7.5.2 Step 2: Model configuration
7.5.3 Step 3: Create the model
7.5.4 Step 4: Load the data
7.5.5 Step 5: Train the model
7.5.6 Step 6: Visualize the loss
7.5.7 Step 7: Make predictions
Summary
Part 3—Generative models and visual embeddings
8 Generative adversarial networks (GANs)
8.1 GAN architecture
8.1.1 Deep convolutional GANs (DCGANs)
8.1.2 The discriminator model
8.1.3 The generator model
8.1.4 Training the GAN
8.1.5 GAN minimax function
8.2 Evaluating GAN models
8.2.1 Inception score
8.2.2 Fréchet inception distance (FID)
8.2.3 Which evaluation scheme to use
8.3 Popular GAN applications
8.3.1 Text-to-photo synthesis
8.3.2 Image-to-image translation (Pix2Pix GAN)
8.3.3 Image super-resolution GAN (SRGAN)
8.3.4 Ready to get your hands dirty?
8.4 Project: Building your own GAN
Summary
9 DeepDream and neural style transfer
9.1 How convolutional neural networks see the world
9.1.1 Revisiting how neural networks work
9.1.2 Visualizing CNN features
9.1.3 Implementing a feature visualizer
9.2 DeepDream
9.2.1 How the DeepDream algorithm works
9.2.2 DeepDream implementation in Keras
9.3 Neural style transfer
9.3.1 Content loss
9.3.2 Style loss
9.3.3 Total variance loss
9.3.4 Network training
Summary
10 Visual embeddings
10.1 Applications of visual embeddings
10.1.1 Face recognition
10.1.2 Image recommendation systems
10.1.3 Object re-identification
10.2 Learning embedding
10.3 Loss functions
10.3.1 Problem setup and formalization
10.3.2 Cross-entropy loss
10.3.3 Contrastive loss
10.3.4 Triplet loss
10.3.5 Naive implementation and runtime analysis of losses
10.4 Mining informative data
10.4.1 Dataloader
10.4.2 Informative data mining: Finding useful triplets
10.4.3 Batch all (BA)
10.4.4 Batch hard (BH)
10.4.5 Batch weighted (BW)
10.4.6 Batch sample (BS)
10.5 Project: Train an embedding network
10.5.1 Fashion: Get me items similar to this
10.5.2 Vehicle re-identification
10.5.3 Implementation
10.5.4 Testing a trained model
10.6 Pushing the boundaries of current accuracy
Summary
References
Appendix A—Getting set up
A.1 Downloading the code repository
A.2 Installing Anaconda
A.3 Setting up your DL environment
A.3.1 Setting up your development environment manually
A.3.2 Using the conda environment in the book’s repo
A.3.3 Saving and loading environments
A.4 Setting up your AWS EC2 environment
A.4.1 Creating an AWS account
A.4.2 Connecting remotely to your instance
A.4.3 Running your Jupyter notebook
index
Numerics
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z