This book provides a straightforward look at the concepts, algorithms and advantages of Bayesian Deep Learning and Deep Generative Models. Starting from the model-based approach to Machine Learning, the authors motivate Probabilistic Graphical Models and show how Bayesian inference naturally lends itself to this framework. The authors present detailed explanations of the main modern algorithms on variational approximations for Bayesian inference in neural networks. Each algorithm of this selected set develops a distinct aspect of the theory. The book builds from the ground-up well-known deep generative models, such as Variational Autoencoder and subsequent theoretical developments. By also exposing the main issues of the algorithms together with different methods to mitigate such issues, the book supplies the necessary knowledge on generative models for the reader to handle a wide range of data types: sequential or not, continuous or not, labelled or not. The book is self-contained, promptly covering all necessary theory so that the reader does not have to search for additional information elsewhere.
- Offers a concise self-contained resource, covering the basic concepts to the algorithms for Bayesian Deep Learning;
- Presents Statistical Inference concepts, offering a set of elucidative examples, practical aspects, and pseudo-codes;
- Every chapter includes hands-on examples and exercises and a website features lecture slides, additional examples, and other support material.
Author(s): Lucas Pinheiro Cinelli, Matheus Araújo Marins, Eduardo Antônio Barros da Silva, Sérgio Lima Netto
Publisher: Springer
Year: 2021
Language: English
Pages: 179
City: Cham
Preface
Contents
Acronyms
1 Introduction
1.1 Historical Context
1.2 On the Notation
References
2 Fundamentals of Statistical Inference
2.1 Models
2.1.1 Parametric Models
2.1.1.1 Location-Scale Families
2.1.2 Nonparametric Models
2.1.3 Latent Variable Models
2.1.4 De Finetti's Representation Theorem
2.1.5 The Likelihood Function
2.2 Exponential Family
2.2.1 Sufficient Statistics
2.2.2 Definition and Properties
2.3 Information Measures
2.3.1 Fisher Information
2.3.2 Entropy
2.3.2.1 Conditional Entropy
2.3.2.2 Differential Entropy
2.3.3 Kullback-Leibler Divergence
2.3.4 Mutual Information
2.4 Bayesian Inference
2.4.1 Bayesian vs. Classical Approach
2.4.2 The Posterior Predictive Distribution
2.4.3 Hierarchical Modeling
2.5 Conjugate Prior Distributions
2.5.1 Definition and Motivation
2.5.2 Conjugate Prior Examples
2.6 Point Estimation
2.6.1 Method of Moments
2.6.2 Maximum Likelihood Estimation
2.6.3 Maximum a Posteriori Estimation
2.6.4 Bayes Estimation
2.6.5 Expectation-Maximization
2.6.5.1 EM Example
2.7 Closing Remarks
References
3 Model-Based Machine Learning and Approximate Inference
3.1 Model-Based Machine Learning
3.1.1 Probabilistic Graphical Models
3.1.1.1 Direct Acyclic Graphs
3.1.1.2 Undirected Graphs
3.1.1.3 The Power of Graphical Models
3.1.2 Probabilistic Programming
3.2 Approximate Inference
3.2.1 Variational Inference
3.2.1.1 The Evidence Lower Bound
3.2.1.2 Information Theoretic View on the ELBO
3.2.1.3 The Mean-Field Approximation
3.2.1.4 Coordinate Ascent Variational Inference
3.2.1.5 Stochastic Variational Inference
3.2.1.6 VI Issues
3.2.1.7 VI Example
3.2.2 Assumed Density Filtering
3.2.2.1 Minimizing the Forward kl Divergence
3.2.2.2 Moment Matching in the Exponential Family
3.2.2.3 ADF Issues
3.2.2.4 ADF Example
3.2.3 Expectation Propagation
3.2.3.1 Recasting adf as a Product of Approximate Factors
3.2.3.2 Operations in the Exponential Family
3.2.3.3 Power EP
3.2.3.4 EP Issues
3.2.3.5 EP Example
3.2.4 Further Practical Extensions
3.2.4.1 Black Box Variational Inference
3.2.4.2 Black Box α Minimization
3.2.4.3 Automatic Differentiation Variational Inference
3.3 Closing Remarks
References
4 Bayesian Neural Networks
4.1 Why BNNs?
4.2 Assessing Uncertainty Quality
4.2.1 Predictive Log-Likelihood
4.2.2 Calibration
4.2.3 Downstream Applications
4.3 Bayes by Backprop
4.3.1 Practical VI
4.4 Probabilistic Backprop
4.4.1 Incorporating the Hyper-Priors p(λ) and p(γ)
4.4.2 Incorporating the Priors on the Weights p(w| λ)
4.4.2.1 Update Equations for αλ and βλ
4.4.2.2 Update Equations for the μ and σ2
4.4.3 Incorporating the Likelihood Factors p(y| W, X, γ)
4.4.3.1 The Normalizing Factor
4.5 MC Dropout
4.5.1 Dropout
4.5.2 A Bayesian View
4.6 Fast Natural Gradient
4.6.1 Vadam
4.7 Comparing the Methods
4.7.1 1-D Toy Example
4.7.2 UCI Data Sets
4.7.2.1 Boston Housing
4.7.2.2 Concrete Compressive Strength
4.7.2.3 Energy Efficiency
4.7.2.4 Kin8nm
4.7.2.5 Condition Based Maintenance of Naval Propulsion Plants
4.7.2.6 Combined Cycle Power Plant
4.7.2.7 Wine Quality
4.7.2.8 Yacht Hydrodynamics
4.7.3 Experimental Setup
4.7.3.1 Hyper-Parameter Search with Bayesian Optimization (BO)
4.7.4 Training Configuration
4.7.5 Analysis
4.8 Further References
4.9 Closing Remarks
References
5 Variational Autoencoder
5.1 Motivations
5.2 Evaluating Generative Networks
5.3 Variational Autoencoders
5.3.1 Conditional VAE
5.3.2 β-VAE
5.4 Importance Weighted Autoencoder
5.5 VAE Issues
5.5.1 Inexpressive Posterior
5.5.1.1 Full Covariance Gaussian
5.5.1.2 Auxiliary Latent Variables
5.5.1.3 Normalizing Flow
5.5.2 The Posterior Collapse
5.5.3 Latent Distributions
5.5.3.1 Continuous Relaxation
5.5.3.2 Vector Quantization
5.6 Experiments
5.6.1 Data Sets
5.6.1.1 MNIST
5.6.1.2 Fashion-MNIST
5.6.2 Experimental Setup
5.6.3 Results
5.7 Application: Generative Models on Semi-supervised Learning
5.8 Closing Remarks
5.9 Final Words
References
A Support Material
A.1 Gradient Estimators
A.2 Update Formula for CAVI
A.3 Generalized Gauss–Newton Approximation
A.4 Natural Gradient and the Fisher Information Matrix
A.5 Gaussian Gradient Identities
A.6 t-Student Distribution
References
Index