Author(s): Ian Goodfellow and Yoshua Bengio and Aaron Courville
Publisher: The MIT Press
Year: 2016
Language: English
Pages: 801
Contents......Page 6
Notation......Page 20
1 Introduction......Page 24
1.1 Who Should Read This Book?......Page 31
1.2 Historical Trends in Deep Learning......Page 35
I Applied Math and Machine Learning Basics......Page 50
2.1 Scalars, Vectors, Matrices and Tensors......Page 52
2.2 Multiplying Matrices and Vectors......Page 55
2.3 Identity and Inverse Matrices......Page 57
2.4 Linear Dependence and Span......Page 58
2.5 Norms......Page 59
2.6 Special Kinds of Matrices and Vectors......Page 61
2.7 Eigendecomposition......Page 62
2.8 Singular Value Decomposition......Page 65
2.9 The Moore-Penrose Pseudoinverse......Page 66
2.10 The Trace Operator......Page 67
2.12 Example: Principal Components Analysis......Page 68
3 Probability and InformationTheory......Page 74
3.1 Why Probability?......Page 75
3.3 Probability Distributions......Page 77
3.4 Marginal Probability......Page 79
3.6 The Chain Rule of Conditional Probabilities......Page 80
3.8 Expectation, Variance and Covariance......Page 81
3.9 Common Probability Distributions......Page 83
3.10 Useful Properties of Common Functions......Page 88
3.12 Technical Details of Continuous Variables......Page 91
3.13 Information Theory......Page 93
3.14 Structured Probabilistic Models......Page 97
4.1 Overflow and Underflow......Page 100
4.3 Gradient-Based Optimization......Page 102
4.4 Constrained Optimization......Page 112
4.5 Example: Linear Least Squares......Page 115
5 Machine Learning Basics......Page 118
5.1 Learning Algorithms......Page 119
5.2 Capacity, Overfitting and Underfitting......Page 130
5.3 Hyperparameters and Validation Sets......Page 140
5.4 Estimators, Bias and Variance......Page 142
5.5 Maximum Likelihood Estimation......Page 151
5.6 Bayesian Statistics......Page 155
5.7 Supervised Learning Algorithms......Page 159
5.8 Unsupervised Learning Algorithms......Page 165
5.9 Stochastic Gradient Descent......Page 170
5.10 Building a Machine Learning Algorithm......Page 172
5.11 Challenges Motivating Deep Learning......Page 174
II Deep Networks: Modern Practices......Page 184
6 Deep Feedforward Networks......Page 186
6.1 Example: Learning XOR......Page 189
6.2 Gradient-Based Learning......Page 194
6.3 Hidden Units......Page 208
6.4 Architecture Design......Page 214
6.5 Back-Propagation and Other Differentiation Algorithms......Page 220
6.6 Historical Notes......Page 240
7 Regularization for Deep Learning......Page 244
7.1 Parameter Norm Penalties......Page 246
7.2 Norm Penalties as Constrained Optimization......Page 253
7.3 Regularization and Under-Constrained Problems......Page 255
7.4 Dataset Augmentation......Page 256
7.5 Noise Robustness......Page 258
7.6 Semi-Supervised Learning......Page 259
7.7 Multitask Learning......Page 260
7.8 Early Stopping......Page 262
7.9 Parameter Tying and Parameter Sharing......Page 269
7.10 Sparse Representations......Page 270
7.11 Bagging and Other Ensemble Methods......Page 272
7.12 Dropout......Page 274
7.13 Adversarial Training......Page 284
7.14 Tangent Distance, Tangent Prop and Manifold Tangent Classifier......Page 286
8 Optimization for Training Deep Models......Page 290
8.1 How Learning Differs from Pure Optimization......Page 291
8.2 Challenges in Neural Network Optimization......Page 298
8.3 Basic Algorithms......Page 309
8.4 Parameter Initialization Strategies......Page 315
8.5 Algorithms with Adaptive Learning Rates......Page 321
8.6 Approximate Second-Order Methods......Page 325
8.7 Optimization Strategies and Meta-Algorithms......Page 332
9 Convolutional Networks......Page 344
9.1 The Convolution Operation......Page 345
9.2 Motivation......Page 347
9.3 Pooling......Page 353
9.4 Convolution and Pooling as an Infinitely Strong Prior......Page 357
9.5 Variants of the Basic Convolution Function......Page 360
9.6 Structured Outputs......Page 370
9.7 Data Types......Page 371
9.8 Efficient Convolution Algorithms......Page 373
9.9 Random or Unsupervised Features......Page 374
9.10 The Neuroscientific Basis for Convolutional Networks......Page 376
9.11 Convolutional Networks and the History of Deep Learning......Page 382
10 Sequence Modeling: Recurrent and Recursive Nets......Page 386
10.1 Unfolding Computational Graphs......Page 388
10.2 Recurrent Neural Networks......Page 391
10.3 Bidirectional RNNs......Page 406
10.4 Encoder-Decoder Sequence-to-Sequence Architectures......Page 408
10.5 Deep Recurrent Networks......Page 410
10.6 Recursive Neural Networks......Page 411
10.7 The Challenge of Long-Term Dependencies......Page 413
10.8 Echo State Networks......Page 415
10.9 Leaky Units and Other Strategies for Multiple Time Scales......Page 418
10.10 The Long Short-Term Memory and Other Gated RNNs......Page 420
10.11 Optimization for Long-Term Dependencies......Page 424
10.12 Explicit Memory......Page 428
11 Practical Methodology......Page 432
11.1 Performance Metrics......Page 433
11.2 Default Baseline Models......Page 436
11.3 Determining Whether to Gather More Data......Page 437
11.4 Selecting Hyperparameters......Page 438
11.5 Debugging Strategies......Page 447
11.6 Example: Multi-Digit Number Recognition......Page 451
12.1 Large-Scale Deep Learning......Page 454
12.2 Computer Vision......Page 463
12.3 Speech Recognition......Page 469
12.4 Natural Language Processing......Page 471
12.5 Other Applications......Page 488
III Deep Learning Research......Page 498
13 Linear Factor Models......Page 502
13.1 Probabilistic PCA and Factor Analysis......Page 503
13.2 Independent Component Analysis (ICA)......Page 504
13.3 Slow Feature Analysis......Page 507
13.4 Sparse Coding......Page 509
13.5 Manifold Interpretation of PCA......Page 512
14 Autoencoders......Page 516
14.1 Undercomplete Autoencoders......Page 517
14.2 Regularized Autoencoders......Page 518
14.3 Representational Power, Layer Size and Depth......Page 522
14.4 Stochastic Encoders and Decoders......Page 523
14.5 Denoising Autoencoders......Page 524
14.6 Learning Manifolds with Autoencoders......Page 529
14.7 Contractive Autoencoders......Page 533
14.8 Predictive Sparse Decomposition......Page 537
14.9 Applications of Autoencoders......Page 538
15 Representation Learning......Page 540
15.1 Greedy Layer-Wise Unsupervised Pretraining......Page 542
15.2 Transfer Learning and Domain Adaptation......Page 549
15.3 Semi-Supervised Disentangling of Causal Factors......Page 555
15.4 Distributed Representation......Page 559
15.5 Exponential Gains from Depth......Page 566
15.6 Providing Clues to Discover Underlying Causes......Page 567
16 Structured Probabilistic Models for Deep Learning......Page 572
16.1 The Challenge of Unstructured Modeling......Page 573
16.2 Using Graphs to Describe Model Structure......Page 577
16.3 Sampling from Graphical Models......Page 593
16.5 Learning about Dependencies......Page 595
16.6 Inference and Approximate Inference......Page 596
16.7 The Deep Learning Approach to Structured Probabilistic Models......Page 598
17.1 Sampling and Monte Carlo Methods......Page 604
17.2 Importance Sampling......Page 606
17.3 Markov Chain Monte Carlo Methods......Page 609
17.4 Gibbs Sampling......Page 613
17.5 The Challenge of Mixing between Separated Modes......Page 614
18 Confronting the Partition Function......Page 620
18.1 The Log-Likelihood Gradient......Page 621
18.2 Stochastic Maximum Likelihood and Contrastive Divergence......Page 622
18.3 Pseudolikelihood......Page 630
18.4 Score Matching and Ratio Matching......Page 632
18.5 Denoising Score Matching......Page 634
18.6 Noise-Contrastive Estimation......Page 635
18.7 Estimating the Partition Function......Page 637
19 Approximate Inference......Page 646
19.1 Inference as Optimization......Page 647
19.2 Expectation Maximization......Page 649
19.3 MAP Inference and Sparse Coding......Page 650
19.4 Variational Inference and Learning......Page 652
19.5 Learned Approximate Inference......Page 665
20.1 Boltzmann Machines......Page 668
20.2 Restricted Boltzmann Machines......Page 670
20.3 Deep Belief Networks......Page 674
20.4 Deep Boltzmann Machines......Page 677
20.5 Boltzmann Machines for Real-Valued Data......Page 690
20.6 Convolutional Boltzmann Machines......Page 696
20.7 Boltzmann Machines for Structured or Sequential Outputs......Page 698
20.8 Other Boltzmann Machines......Page 700
20.9 Back-Propagation through Random Operations......Page 701
20.10 Directed Generative Nets......Page 705
20.11 Drawing Samples from Autoencoders......Page 724
20.12 Generative Stochastic Networks......Page 727
20.13 Other Generation Schemes......Page 729
20.14 Evaluating Generative Models......Page 730
20.15 Conclusion......Page 733
Bibliography......Page 734
Index......Page 790