Welcome to the ImageNet Bundle of Deep Learning for Computer Vision with Python, the final volume in the series. This volume is meant to be the most advanced in terms of content, covering techniques that will enable you to reproduce results of state-of-the-art publications, papers, and talks. To help keep this work organized, I've structured the ImageNet Bundle in two parts.
In the first part, we'll explore the ImageNet dataset in detail and learn how to train state-of-the art deep networks including AlexNet, VGGNet, GoogLeNet, ResNet, and SqueezeNet from scratch, obtaining as similar accuracies as possible as their respective original works. In order to accomplish this goal, we’ll need to call on all of our skills from the Starter Bundle and Practitioner Bundle.
The second part of this book focuses on case studies – real-world applications of applying deep learning and computer vision to solve a particular problem. We'll first start off by training a CNN from scratch to recognition emotions/facial expressions of people in real-time video streams. From there we’ll use transfer learning via feature extraction to automatically detect and correct image orientation. A second case study on transfer learning (this time via fine-tuning) will enable us to recognize over 164 vehicle makes and models in images. A model such as this one could enable you to create an “intelligent” highway billboard system that displays targeted information or advertising to the driver based on what type of vehicle they are driving. Our final case study will demonstrate how to train a CNN to correctly predict the age and gender of a person in a photo.
Author(s): Adrian Rosebrock
Edition: 1.2.1
Publisher: PyImageSearch
Year: 2017
Language: English
Pages: 323
Tags: Deep Learning;Computer Vision;Python
1 Introduction......Page 15
2 Introduction......Page 17
3.1 How Many GPUs Do I Need?......Page 19
3.2 Performance Gains Using Multiple GPUs......Page 20
3.3 Summary......Page 21
4.1.1 ILSVRC......Page 23
4.2.2 Downloading Images Programmatically......Page 25
4.2.4 ImageNet Development Kit......Page 26
4.2.5 ImageNet Copyright Concerns......Page 27
4.3 Summary......Page 29
5.1 Understanding the ImageNet File Structure......Page 31
5.1.1 ImageNet “test” Directory......Page 32
5.1.2 ImageNet “train” Directory......Page 33
5.1.3 ImageNet “val” Directory......Page 34
5.1.4 ImageNet “ImageSets” Directory......Page 35
5.1.5 ImageNet “DevKit” Directory......Page 36
5.2.1 Your First ImageNet Configuration File......Page 39
5.2.2 Our ImageNet Helper Utility......Page 44
5.2.3 Creating List and Mean Files......Page 48
5.2.4 Building the Compact Record Files......Page 52
5.3 Summary......Page 54
6 Training AlexNet on ImageNet......Page 55
6.1 Implementing AlexNet......Page 56
6.2 Training AlexNet......Page 60
6.2.1 What About Training Plots?......Page 61
6.2.2 Implementing the Training Script......Page 62
6.3 Evaluating AlexNet......Page 67
6.4 AlexNet Experiments......Page 69
6.4.1 AlexNet: Experiment #1......Page 70
6.4.2 AlexNet: Experiment #2......Page 72
6.4.3 AlexNet: Experiment #3......Page 73
6.5 Summary......Page 76
7 Training VGGNet on ImageNet......Page 77
7.1 Implementing VGGNet......Page 78
7.2 Training VGGNet......Page 83
7.3 Evaluating VGGNet......Page 87
7.4 VGGNet Experiments......Page 88
7.5 Summary......Page 90
8.1 Understanding GoogLeNet......Page 91
8.1.2 GoogLeNet Architecture......Page 92
8.1.3 Implementing GoogLeNet......Page 93
8.1.4 Training GoogLeNet......Page 97
8.3 GoogLeNet Experiments......Page 101
8.3.1 GoogLeNet: Experiment #1......Page 102
8.3.2 GoogLeNet: Experiment #2......Page 103
8.3.3 GoogLeNet: Experiment #3......Page 104
8.4 Summary......Page 105
9.1 Understanding ResNet......Page 107
9.2 Implementing ResNet......Page 108
9.3 Training ResNet......Page 114
9.5.2 ResNet: Experiment #2......Page 118
9.5.3 ResNet: Experiment #3......Page 119
9.6 Summary......Page 122
10.1.1 The Fire Module......Page 123
10.1.2 SqueezeNet Architecture......Page 125
10.1.3 Implementing SqueezeNet......Page 126
10.2 Training SqueezeNet......Page 130
10.4.1 SqueezeNet: Experiment #1......Page 134
10.4.2 SqueezeNet: Experiment #2......Page 136
10.4.3 SqueezeNet: Experiment #3......Page 137
10.4.4 SqueezeNet: Experiment #4......Page 138
10.5 Summary......Page 141
11.1.1 The FER13 Dataset......Page 143
11.1.2 Building the FER13 Dataset......Page 144
11.2 Implementing a VGG-like Network......Page 149
11.3 Training Our Facial Expression Recognizer......Page 152
11.3.2 EmotionVGGNet: Experiment #2......Page 155
11.3.3 EmotionVGGNet: Experiment #3......Page 156
11.3.4 EmotionVGGNet: Experiment #4......Page 157
11.4 Evaluating our Facial Expression Recognizer......Page 159
11.5 Emotion Detection in Real-time......Page 161
11.6 Summary......Page 165
12.1 The Indoor CVPR Dataset......Page 167
12.1.1 Building the Dataset......Page 168
12.2 Extracting Features......Page 172
12.3 Training an Orientation Correction Classifier......Page 175
12.4 Correcting Orientation......Page 177
12.5 Summary......Page 179
13.1 The Stanford Cars Dataset......Page 181
13.1.1 Building the Stanford Cars Dataset......Page 182
13.2 Fine-tuning VGG on the Stanford Cars Dataset......Page 189
13.2.1 VGG Fine-tuning: Experiment #1......Page 194
13.2.2 VGG Fine-tuning: Experiment #2......Page 195
13.2.3 VGG Fine-tuning: Experiment #3......Page 196
13.3 Evaluating our Vehicle Classifier......Page 197
13.4 Visualizing Vehicle Classification Results......Page 199
13.5 Summary......Page 203
14.1 The Ethics of Gender Identification in Machine Learning......Page 205
14.2 The Adience Dataset......Page 206
14.2.1 Building the Adience Dataset......Page 207
14.3 Implementing Our Network Architecture......Page 221
14.4 Measuring “One-off” Accuracy......Page 223
14.5 Training Our Age and Gender Predictor......Page 226
14.6 Evaluating Age and Gender Prediction......Page 229
14.7.1 Age Results......Page 232
14.7.2 Gender Results......Page 233
14.8 Visualizing Results......Page 235
14.8.1 Visualizing Results from Inside Adience......Page 236
14.8.2 Understanding Face Alignment......Page 240
14.8.3 Applying Age and Gender Prediction to Your Own Images......Page 242
14.9 Summary......Page 246
15.1 Object Detection and Deep Learning......Page 249
15.1.1 Measuring Object Detector Performance......Page 250
15.2.1 A Brief History of R-CNN......Page 252
15.2.2 The Base Network......Page 256
15.2.3 Anchors......Page 257
15.2.4 Region Proposal Network (RPN)......Page 259
15.2.5 Region of Interest (ROI) Pooling......Page 260
15.2.6 Region-based Convolutional Neural Network......Page 261
15.3 Summary......Page 262
16.1 The LISA Traffic Signs Dataset......Page 263
16.2 Installing the TensorFlow Object Detection API......Page 264
16.3.1 Project Directory Structure......Page 265
16.3.2 Configuration......Page 267
16.3.3 A TensorFlow Annotation Class......Page 269
16.3.4 Building the LISA + TensorFlow Dataset......Page 271
16.3.5 A Critical Pre-Training Step......Page 276
16.3.6 Configuring the Faster R-CNN......Page 277
16.3.7 Training the Faster R-CNN......Page 282
16.3.8 Suggestions When Working with the TFOD API......Page 284
16.3.10 Faster R-CNN on Images and Videos......Page 288
16.4 Summary......Page 292
17.1.1 Motivation......Page 295
17.1.2 Architecture......Page 296
17.1.3 MultiBox, Priors, and Fixed Priors......Page 297
17.1.4 Training Methods......Page 298
17.2 Summary......Page 299
18.1 The Vehicle Dataset......Page 301
18.2.1 Directory Structure and Configuration......Page 302
18.2.2 Building the Vehicle Dataset......Page 304
18.2.3 Training the SSD......Page 309
18.2.4 SSD Results......Page 312
18.2.5 Potential Problems and Limitations......Page 313
18.3 Summary......Page 314
19 Conclusions......Page 315
19.1 Where to Now?......Page 316