Author(s): Charu C. Aggarwal
Series: Springer
Year: 0
Language: English
Pages: 0
Preface......Page 3
Contents......Page 6
1.1 Introduction......Page 15
1.1.1 Humans Versus Computers: Stretching the Limitsof Artificial Intelligence......Page 17
1.2 The Basic Architecture of Neural Networks......Page 18
1.2.1 Single Computational Layer: The Perceptron......Page 19
1.2.1.1 What Objective Function Is the Perceptron Optimizing?......Page 22
1.2.1.2 Relationship with Support Vector Machines......Page 24
1.2.1.3 Choice of Activation and Loss Functions......Page 25
1.2.1.5 Choice of Loss Function......Page 28
1.2.1.6 Some Useful Derivatives of Activation Functions......Page 30
1.2.2 Multilayer Neural Networks......Page 31
1.2.3 The Multilayer Network as a Computational Graph......Page 34
1.3 Training a Neural Network with Backpropagation......Page 35
1.4 Practical Issues in Neural Network Training......Page 38
1.4.1 The Problem of Overfitting......Page 39
1.4.1.1 Regularization......Page 40
1.4.1.4 Trading Off Breadth for Depth......Page 41
1.4.2 The Vanishing and Exploding Gradient Problems......Page 42
1.4.5 Computational Challenges......Page 43
1.5 The Secrets to the Power of Function Composition......Page 44
1.5.1 The Importance of Nonlinear Activation......Page 46
1.5.2 Reducing Parameter Requirements with Depth......Page 48
1.5.3.1 Blurring the Distinctions Between Input, Hidden,and Output Layers......Page 49
1.5.3.2 Unconventional Operations and Sum-Product Networks......Page 50
1.6.2 Radial Basis Function Networks......Page 51
1.6.4 Recurrent Neural Networks......Page 52
1.6.5 Convolutional Neural Networks......Page 54
1.6.6 Hierarchical Feature Engineering and Pretrained Models......Page 56
1.7.1 Reinforcement Learning......Page 58
1.7.3 Generative Adversarial Networks......Page 59
1.8.1 The MNIST Database of Handwritten Digits......Page 60
1.8.2 The ImageNet Database......Page 61
1.10 Bibliographic Notes......Page 62
1.10.2 Software Resources......Page 64
1.11 Exercises......Page 65
2.1 Introduction......Page 67
2.2 Neural Architectures for Binary Classification Models......Page 69
2.2.1 Revisiting the Perceptron......Page 70
2.2.2 Least-Squares Regression......Page 72
2.2.2.1 Widrow-Hoff Learning......Page 73
2.2.3 Logistic Regression......Page 75
2.2.4 Support Vector Machines......Page 77
2.3.1 Multiclass Perceptron......Page 79
2.3.2 Weston-Watkins SVM......Page 81
2.3.3 Multinomial Logistic Regression (Softmax Classifier)......Page 82
2.3.4 Hierarchical Softmax for Many Classes......Page 83
2.5 Matrix Factorization with Autoencoders......Page 84
2.5.1 Autoencoder: Basic Principles......Page 85
2.5.1.1 Autoencoder with a Single Hidden Layer......Page 86
2.5.1.3 Sharing Weights in Encoder and Decoder......Page 88
2.5.2 Nonlinear Activations......Page 90
2.5.3 Deep Autoencoders......Page 92
2.5.4 Application to Outlier Detection......Page 94
2.5.5.1 Sparse Feature Learning......Page 95
2.5.6 Other Applications......Page 96
2.5.7 Recommender Systems: Row Index to Row Value Prediction......Page 97
2.5.8 Discussion......Page 100
2.6.1 Neural Embedding with Continuous Bag of Words......Page 101
2.6.2 Neural Embedding with Skip-Gram Model......Page 104
2.6.3 Word2vec (SGNS) Is Logistic Matrix Factorization......Page 109
2.7 Simple Neural Architectures for Graph Embeddings......Page 112
2.7.3 Connections with DeepWalk and Node2vec......Page 114
2.9 Bibliographic Notes......Page 115
2.9.1 Software Resources......Page 116
2.10 Exercises......Page 117
3.1 Introduction......Page 119
3.2.1 Backpropagation with the Computational Graph Abstraction......Page 121
3.2.2 Dynamic Programming to the Rescue......Page 125
3.2.3 Backpropagation with Post-Activation Variables......Page 127
3.2.4 Backpropagation with Pre-activation Variables......Page 129
3.2.5.1 The Special Case of Softmax......Page 131
3.2.6 A Decoupled View of Vector-Centric Backpropagation......Page 132
3.2.8 Mini-Batch Stochastic Gradient Descent......Page 135
3.2.9 Backpropagation Tricks for Handling Shared Weights......Page 137
3.2.10 Checking the Correctness of Gradient Computation......Page 138
3.3.1 Tuning Hyperparameters......Page 139
3.3.2 Feature Preprocessing......Page 140
3.3.3 Initialization......Page 142
3.4 The Vanishing and Exploding Gradient Problems......Page 143
3.4.1 Geometric Understanding of the Effect of Gradient Ratios......Page 144
3.4.3.1 Leaky ReLU......Page 147
3.5 Gradient-Descent Strategies......Page 148
3.5.1 Learning Rate Decay......Page 149
3.5.2 Momentum-Based Learning......Page 150
3.5.3 Parameter-Specific Learning Rates......Page 151
3.5.3.2 RMSProp......Page 152
3.5.3.4 AdaDelta......Page 153
3.5.3.5 Adam......Page 154
3.5.4 Cliffs and Higher-Order Instability......Page 155
3.5.5 Gradient Clipping......Page 156
3.5.6 Second-Order Derivatives......Page 157
3.5.6.1 Conjugate Gradients and Hessian-Free Optimization......Page 159
3.5.6.2 Quasi-Newton Methods and BFGS......Page 162
3.5.6.3 Problems with Second-Order Methods: Saddle Points......Page 163
3.5.8 Local and Spurious Minima......Page 165
3.6 Batch Normalization......Page 166
3.7 Practical Tricks for Acceleration and Compression......Page 170
3.7.1 GPU Acceleration......Page 171
3.7.2 Parallel and Distributed Implementations......Page 172
3.7.3 Algorithmic Tricks for Model Compression......Page 174
3.9 Bibliographic Notes......Page 177
3.10 Exercises......Page 179
4.1 Introduction......Page 182
4.2 The Bias-Variance Trade-Off......Page 187
4.2.1 Formal View......Page 188
4.3 Generalization Issues in Model Tuning and Evaluation......Page 191
4.3.1 Evaluating with Hold-Out and Cross-Validation......Page 192
4.3.2 Issues with Training at Scale......Page 193
4.4 Penalty-Based Regularization......Page 194
4.4.1 Connections with Noise Injection......Page 195
4.4.2 L1-Regularization......Page 196
4.4.3 L1- or L2-Regularization?......Page 197
4.4.4 Penalizing Hidden Units: Learning Sparse Representations......Page 198
4.5.1 Bagging and Subsampling......Page 199
4.5.2 Parametric Model Selection and Averaging......Page 200
4.5.4 Dropout......Page 201
4.5.5 Data Perturbation Ensembles......Page 204
4.6.1 Understanding Early Stopping from the Variance Perspective......Page 205
4.7 Unsupervised Pretraining......Page 206
4.7.2 What About Supervised Pretraining?......Page 210
4.8.1 Continuation Learning......Page 212
4.9 Parameter Sharing......Page 213
4.10 Regularization in Unsupervised Applications......Page 214
4.10.2 Noise Injection: De-noising Autoencoders......Page 215
4.10.3 Gradient-Based Penalization: Contractive Autoencoders......Page 217
4.10.4 Hidden Probabilistic Structure: Variational Autoencoders......Page 220
4.10.4.1 Reconstruction and Generative Sampling......Page 223
4.10.4.2 Conditional Variational Autoencoders......Page 225
4.11 Summary......Page 226
4.12 Bibliographic Notes......Page 227
4.13 Exercises......Page 228
5.1 Introduction......Page 230
5.2 Training an RBF Network......Page 233
5.2.1 Training the Hidden Layer......Page 234
5.2.2 Training the Output Layer......Page 235
5.2.3 Orthogonal Least-Squares Algorithm......Page 237
5.2.4 Fully Supervised Learning......Page 238
5.3.1 Classification with Perceptron Criterion......Page 239
5.3.3 Example of Linear Separability Promoted by RBF......Page 240
5.3.4 Application to Interpolation......Page 241
5.4.1 Kernel Regression as a Special Case of RBF Networks......Page 242
5.4.2 Kernel SVM as a Special Case of RBF Networks......Page 243
5.5 Summary......Page 244
5.7 Exercises......Page 245
6.1 Introduction......Page 247
6.1.1 Historical Perspective......Page 248
6.2 Hopfield Networks......Page 249
6.2.1 Optimal State Configurations of a Trained Network......Page 250
6.2.2 Training a Hopfield Network......Page 252
6.2.3 Building a Toy Recommender and Its Limitations......Page 253
6.2.4 Increasing the Expressive Power of the Hopfield Network......Page 254
6.3 The Boltzmann Machine......Page 255
6.3.1 How a Boltzmann Machine Generates Data......Page 256
6.3.2 Learning the Weights of a Boltzmann Machine......Page 257
6.4 Restricted Boltzmann Machines......Page 259
6.4.1 Training the RBM......Page 261
6.4.2 Contrastive Divergence Algorithm......Page 262
6.5 Applications of Restricted Boltzmann Machines......Page 263
6.5.1 Dimensionality Reduction and Data Reconstruction......Page 264
6.5.2 RBMs for Collaborative Filtering......Page 266
6.5.3 Using RBMs for Classification......Page 269
6.5.4 Topic Models with RBMs......Page 272
6.5.5 RBMs for Machine Learning with Multimodal Data......Page 274
6.6 Using RBMs Beyond Binary Data Types......Page 275
6.7 Stacking Restricted Boltzmann Machines......Page 276
6.7.1 Unsupervised Learning......Page 278
6.7.3 Deep Boltzmann Machines and Deep Belief Networks......Page 279
6.9 Bibliographic Notes......Page 280
6.10 Exercises......Page 282
7.1 Introduction......Page 283
7.2 The Architecture of Recurrent Neural Networks......Page 286
7.2.1 Language Modeling Example of RNN......Page 289
7.2.1.1 Generating a Language Sample......Page 290
7.2.2 Backpropagation Through Time......Page 292
7.2.3 Bidirectional Recurrent Networks......Page 295
7.2.4 Multilayer Recurrent Networks......Page 296
7.3 The Challenges of Training Recurrent Networks......Page 298
7.3.1 Layer Normalization......Page 301
7.4 Echo-State Networks......Page 302
7.5 Long Short-Term Memory (LSTM)......Page 304
7.6 Gated Recurrent Units (GRUs)......Page 307
7.7 Applications of Recurrent Neural Networks......Page 309
7.7.1 Application to Automatic Image Captioning......Page 310
7.7.2 Sequence-to-Sequence Learning and Machine Translation......Page 311
7.7.2.1 Question-Answering Systems......Page 313
7.7.3 Application to Sentence-Level Classification......Page 315
7.7.4 Token-Level Classification with Linguistic Features......Page 316
7.7.5 Time-Series Forecasting and Prediction......Page 317
7.7.6 Temporal Recommender Systems......Page 319
7.7.9 Handwriting Recognition......Page 321
7.9 Bibliographic Notes......Page 322
7.9.1 Software Resources......Page 323
7.10 Exercises......Page 324
8.1 Introduction......Page 326
8.1.1 Historical Perspective and Biological Inspiration......Page 327
8.1.2 Broader Observations About Convolutional Neural Networks......Page 328
8.2 The Basic Structure of a Convolutional Network......Page 329
8.2.1 Padding......Page 333
8.2.3 Typical Settings......Page 335
8.2.4 The ReLU Layer......Page 336
8.2.5 Pooling......Page 337
8.2.6 Fully Connected Layers......Page 338
8.2.7 The Interleaving Between Layers......Page 339
8.2.8 Local Response Normalization......Page 341
8.2.9 Hierarchical Feature Engineering......Page 342
8.3 Training a Convolutional Network......Page 343
8.3.1 Backpropagating Through Convolutions......Page 344
8.3.2 Backpropagation as Convolution with Inverted/Transposed Filter......Page 345
8.3.3 Convolution/Backpropagation as Matrix Multiplications......Page 346
8.3.4 Data Augmentation......Page 348
8.4 Case Studies of Convolutional Architectures......Page 349
8.4.1 AlexNet......Page 350
8.4.2 ZFNet......Page 352
8.4.3 VGG......Page 353
8.4.4 GoogLeNet......Page 356
8.4.5 ResNet......Page 358
8.4.6 The Effects of Depth......Page 361
8.4.7 Pretrained Models......Page 362
8.5 Visualization and Unsupervised Learning......Page 363
8.5.1 Visualizing the Features of a Trained Network......Page 364
8.5.2 Convolutional Autoencoders......Page 368
8.6.1 Content-Based Image Retrieval......Page 374
8.6.2 Object Localization......Page 375
8.6.3 Object Detection......Page 376
8.6.4 Natural Language and Sequence Learning......Page 377
8.6.5 Video Classification......Page 378
8.8 Bibliographic Notes......Page 379
8.8.1 Software Resources and Data Sets......Page 381
8.9 Exercises......Page 382
9.1 Introduction......Page 383
9.2 Stateless Algorithms: Multi-Armed Bandits......Page 385
9.2.3 Upper Bounding Methods......Page 386
9.3 The Basic Framework of Reinforcement Learning......Page 387
9.3.1 Challenges of Reinforcement Learning......Page 389
9.3.3 Role of Deep Learning and a Straw-Man Algorithm......Page 390
9.4 Bootstrapping for Value Function Learning......Page 393
9.4.1 Deep Learning Models as Function Approximators......Page 394
9.4.2 Example: Neural Network for Atari Setting......Page 396
9.4.3 On-Policy Versus Off-Policy Methods: SARSA......Page 397
9.4.4 Modeling States Versus State-Action Pairs......Page 399
9.5 Policy Gradient Methods......Page 401
9.5.1 Finite Difference Methods......Page 402
9.5.2 Likelihood Ratio Methods......Page 403
9.5.4 Actor-Critic Methods......Page 405
9.5.6 Advantages and Disadvantages of Policy Gradients......Page 407
9.6 Monte Carlo Tree Search......Page 408
9.7.1 AlphaGo: Championship Level Play at Go......Page 409
9.7.1.1 Alpha Zero: Enhancements to Zero Human Knowledge......Page 412
9.7.2.1 Deep Learning of Locomotion Skills......Page 414
9.7.2.2 Deep Learning of Visuomotor Skills......Page 416
9.7.3 Building Conversational Systems: Deep Learning for Chatbots......Page 417
9.7.4 Self-Driving Cars......Page 420
9.7.5 Inferring Neural Architectures with Reinforcement Learning......Page 422
9.8 Practical Challenges Associated with Safety......Page 423
9.10 Bibliographic Notes......Page 424
9.11 Exercises......Page 426
10.1 Introduction......Page 428
10.2 Attention Mechanisms......Page 430
10.2.1 Recurrent Models of Visual Attention......Page 431
10.2.1.1 Application to Image Captioning......Page 433
10.2.2 Attention Mechanisms for Machine Translation......Page 434
10.3 Neural Networks with External Memory......Page 438
10.3.1 A Fantasy Video Game: Sorting by Example......Page 439
10.3.1.1 Implementing Swaps with Memory Operations......Page 440
10.3.2 Neural Turing Machines......Page 441
10.3.3 Differentiable Neural Computer: A Brief Overview......Page 446
10.4 Generative Adversarial Networks (GANs)......Page 447
10.4.1 Training a Generative Adversarial Network......Page 448
10.4.3 Using GANs for Generating Image Data......Page 451
10.4.4 Conditional Generative Adversarial Networks......Page 453
10.5 Competitive Learning......Page 458
10.5.2 Kohonen Self-Organizing Map......Page 459
10.6.1 An Aspirational Goal: One-Shot Learning......Page 462
10.6.2 An Aspirational Goal: Energy-Efficient Learning......Page 464
10.7 Summary......Page 465
10.8 Bibliographic Notes......Page 466
10.9 Exercises......Page 467
Biblio......Page 468
Index......Page 502