This book seeks to bridge the gap between statistics and computer science. It provides an overview of Monte Carlo methods, including Sequential Monte Carlo, Markov Chain Monte Carlo, Metropolis-Hastings, Gibbs Sampler, Cluster Sampling, Data Driven MCMC, Stochastic Gradient descent, Langevin Monte Carlo, Hamiltonian Monte Carlo, and energy landscape mapping. Due to its comprehensive nature, the book is suitable for developing and teaching graduate courses on Monte Carlo methods. To facilitate learning, each chapter includes several representative application examples from various fields. The book pursues two main goals: (1) It introduces researchers to applying Monte Carlo methods to broader problems in areas such as Computer Vision, Computer Graphics, Machine Learning, Robotics, Artificial Intelligence, etc.; and (2) it makes it easier for scientists and engineers working in these areas to employ Monte Carlo methods to enhance their research.
Author(s): Adrian Barbu, Song-Chun Zhu
Edition: 1
Publisher: Springer
Year: 2020
Language: English
Pages: 433
Tags: Monte Carlo Methods
Preface
Contents
About the Authors
1 Introduction to Monte Carlo Methods
1.1 Introduction
1.2 Motivation and Objectives
1.3 Tasks in Monte Carlo Computing
1.3.1 Task 1: Sampling and Simulation
1.3.2 Task 2: Estimating Quantities by Monte Carlo Simulation
1.3.3 Task 3: Optimization and Bayesian Inference
1.3.4 Task 4: Learning and Model Estimation
1.3.5 Task 5: Visualizing the Landscape
References
2 Sequential Monte Carlo
2.1 Introduction
2.2 Sampling a 1-Dimensional Density
2.3 Importance Sampling and Weighted Samples
2.4 Sequential Importance Sampling (SIS)
2.4.1 Application: The Number of Self-Avoiding Walks
2.4.2 Application: Particle Filtering for Tracking Objectsin a Video
2.4.2.1 Application: Curve Tracking
2.4.3 Summary of the SMC Framework
2.5 Application: Ray Tracing by SMC
2.5.1 Example: Glossy Highlights
2.6 Preserving Sample Diversity in Importance Sampling
2.6.1 Parzen Window Discussion
2.7 Monte Carlo Tree Search
2.7.1 Pure Monte Carlo Tree Search
2.7.2 AlphaGo
2.8 Exercises
References
3 Markov Chain Monte Carlo: The Basics
3.1 Introduction
3.2 Markov Chain Basics
3.3 Topology of Transition Matrix: Communication and Period
3.4 The Perron-Frobenius Theorem
3.5 Convergence Measures
3.6 Markov Chains in Continuous or Heterogeneous State Spaces
3.7 Ergodicity Theorem
3.8 MCMC for Optimization by Simulated Annealing
3.8.1 Page Rank Example
3.9 Exercises
References
4 Metropolis Methods and Variants
4.1 Introduction
4.2 The Metropolis-Hastings Algorithm
4.2.1 The Original Metropolis-Hastings Algorithm
4.2.2 Another Version of the Metropolis-Hastings Algorithm
4.2.3 Other Acceptance Probability Designs
4.2.4 Key Issues in Metropolis Design
4.3 The Independence Metropolis Sampler
4.3.1 The Eigenstructure of the IMS
4.3.2 General First Hitting Time for Finite Spaces
4.3.3 Hitting Time Analysis for the IMS
4.4 Reversible Jumps and Trans-Dimensional MCMC
4.4.1 Reversible Jumps
4.4.2 Toy Example: 1D Range Image Segmentation
4.4.2.1 Jump-Diffusion
4.5 Application: Counting People
4.5.1 Marked Point Process Model
4.5.2 Inference by MCMC
4.5.3 Results
4.6 Application: Furniture Arrangement
4.7 Application: Scene Synthesis
4.8 Exercises
References
5 Gibbs Sampler and Its Variants
5.1 Introduction
5.2 Gibbs Sampler
5.2.1 A Major Problem with the Gibbs Sampler
5.3 Gibbs Sampler Generalizations
5.3.1 Hit-and-Run
5.3.2 Generalized Gibbs Sampler
5.3.3 Generalized Hit-and-Run
5.3.4 Sampling with Auxiliary Variables
5.3.5 Simulated Tempering
5.3.6 Slice Sampling
5.3.7 Data Augmentation
5.3.8 Metropolized Gibbs Sampler
5.4 Data Association and Data Augmentation
5.5 Julesz Ensemble and MCMC Sampling of Texture
5.5.1 The Julesz Ensemble: A Mathematical Definitionof Texture
5.5.2 The Gibbs Ensemble and Ensemble Equivalence
5.5.3 Sampling the Julesz Ensemble
5.5.4 Experiment: Sampling the Julesz Ensemble
5.6 Exercises
References
6 Cluster Sampling Methods
6.1 Introduction
6.2 Potts Model and Swendsen-Wang
6.3 Interpretations of the SW Algorithm
6.3.1 Interpretation 1: Metropolis-Hastings Perspective
6.3.2 Interpretation 2: Data Augmentation
6.4 Some Theoretical Results
6.5 Swendsen-Wang Cuts for Arbitrary Probabilities
6.5.1 Step 1: Data-Driven Clustering
6.5.2 Step 2: Color Flipping
6.5.3 Step 3: Accepting the Flip
6.5.4 Complexity Analysis
6.6 Variants of the Cluster Sampling Method
6.6.1 Cluster Gibbs Sampling: The ``Hit-and-Run'' Perspective
6.6.2 The Multiple Flipping Scheme
6.7 Application: Image Segmentation
6.8 Multigrid and Multi-level SW-cut
6.8.1 SW-Cuts at Multigrid
6.8.2 SW-cuts at Multi-level
6.9 Subspace Clustering
6.9.1 Subspace Clustering by Swendsen-Wang Cuts
6.9.2 Application: Sparse Motion Segmentation
6.10 C4: Clustering Cooperative and Competitive Constraints
6.10.1 Overview of the C4 Algorithm
6.10.2 Graphs, Coupling, and Clustering
6.10.3 C4 Algorithm on Flat Graphs
6.10.4 Experiments on Flat Graphs
6.10.5 Checkerboard Ising Model
6.10.6 C4 on Hierarchical Graphs
6.10.7 Experiments on Hierarchical C4
6.11 Exercises
References
7 Convergence Analysis of MCMC
7.1 Introduction
7.2 Key Convergence Topics
7.3 Practical Methods for Monitoring
7.4 Coupling Methods for Card Shuffling
7.4.1 Shuffling to the Top
7.4.2 Riffle Shuffling
7.5 Geometric Bounds, Bottleneck and Conductance
7.5.1 Geometric Convergence
7.6 Peskun's Ordering and Ergodicity Theorem
7.7 Path Coupling and Exact Sampling
7.7.1 Coupling From the Past
7.7.2 Application: Sampling the Ising Model
7.8 Exercises
References
8 Data Driven Markov Chain Monte Carlo
8.1 Introduction
8.2 Issues with Segmentation and Introduction to DDMCMC
8.3 Simple Illustration of the DDMCMC
8.3.1 Designing MCMC: The Basic Issues
8.3.2 Computing Proposal Probabilities in the Atomic Spaces: Atomic Particles
8.3.3 Computing Proposal Probabilities in Object Spaces: Object Particles
8.3.4 Computing Multiple, Distinct Solutions: Scene Particles
8.3.5 The Ψ-World Experiment
8.4 Problem Formulation and Image Models
8.4.1 Bayesian Formulation for Segmentation
8.4.2 The Prior Probability
8.4.3 The Likelihood for Grey Level Images
8.4.4 Model Calibration
8.4.5 Image Models for Color
8.5 Anatomy of Solution Space
8.6 Exploring the Solution Space by Ergodic Markov Chains
8.6.1 Five Markov Chain Dynamics
8.6.2 The Bottlenecks
8.7 Data-Driven Methods
8.7.1 Method I: Clustering in Atomic Spaces
8.7.2 Method II: Edge Detection
8.8 Computing Importance Proposal Probabilities
8.9 Computing Multiple Distinct Solutions
8.9.1 Motivation and a Mathematical Principle
8.9.2 A K-Adventurers Algorithm for Multiple Solutions
8.10 Image Segmentation Experiments
8.11 Application: Image Parsing
8.11.1 Bottom-Up and Top-Down Processing
8.11.2 Generative and Discriminative Methods
8.11.3 Markov Chain Kernels and Sub-Kernels
8.11.4 DDMCMC and Proposal Probabilities
8.11.4.1 Generative Models and Bayesian Formulation
8.11.4.2 Shape Models
8.11.4.3 Generative Intensity Models
8.11.4.4 Overview of the Algorithm
8.11.4.5 Discriminative Methods
8.11.4.6 Control Structure of the Algorithm
8.11.5 The Markov Chain Kernels
8.11.5.1 Boundary Evolution
8.11.5.2 Markov Chain Sub-Kernels
8.11.5.3 AdaBoost for Discriminative Probabilities for Face and Text
8.11.6 Image Parsing Experiments
8.12 Exercises
References
9 Hamiltonian and Langevin Monte Carlo
9.1 Introduction
9.2 Hamiltonian Mechanics
9.2.1 Hamilton's Equations
9.2.2 A Simple Model of HMC
9.3 Properties of Hamiltonian Mechanics
9.3.1 Conservation of Energy
9.3.2 Reversibility
9.3.3 Symplectic Structure and Volume Preservation
9.4 The Leapfrog Discretization of Hamilton's Equations
9.4.1 Euler's Method
9.4.2 Modified Euler's Method
9.4.3 The Leapfrog Integrator
9.4.4 Properties of the Leapfrog Integrator
9.5 Hamiltonian Monte Carlo and Langevin Monte Carlo
9.5.1 Formulation of HMC
9.5.2 The HMC Algorithm
9.5.3 The LMC Algorithm
9.5.4 Tuning HMC
9.5.5 Proof of Detailed Balance for HMC
9.6 Riemann Manifold HMC
9.6.1 Linear Transformations in HMC
9.6.2 RMHMC Dynamics
9.6.3 RMHMC Algorithm and Variants
9.6.4 Covariance Functions in RMHMC
9.7 HMC in Practice
9.7.1 Simulated Experiments on Constrained NormalDistributions
9.7.2 Sampling Logistic Regression Coefficients with RMHMC
9.7.3 Sampling Image Densities with LMC: FRAME, GRADE and DeepFRAME
9.8 Exercises
References
10 Learning with Stochastic Gradient
10.1 Introduction
10.2 Stochastic Gradient: Motivation and Properties
10.2.1 Motivating Cases
10.2.2 Robbins-Monro Theorem
10.2.3 Stochastic Gradient Descent and the Langevin Equation
10.3 Parameter Estimation for Markov Random Field (MRF) Models
10.3.1 Learning a FRAME Model with Stochastic Gradient
10.3.2 Alternate Methods of Learning for FRAME
10.3.3 Four Variants of the FRAME Algorithm
10.3.4 Experiments
10.4 Learning Image Models with Neural Networks
10.4.1 Contrastive Divergence and Persistent ContrastiveDivergence
10.4.2 Learning a Potential Energy for Images with Deep Networks: DeepFRAME
10.4.3 Generator Networks and Alternating Backward Propagation
10.4.4 Cooperative Energy and Generator Models
10.5 Exercises
References
11 Mapping the Energy Landscape
11.1 Introduction
11.2 Landscape Examples, Structures, and Tasks
11.2.1 Energy-Based Partitions of the State Space
11.2.2 Constructing a Disconnectivity Graph
11.2.3 ELM Example in 2D
11.2.4 Characterizing the Difficulty (or Complexity) of Learning Tasks
11.3 Generalized Wang-Landau Algorithm
11.3.1 Barrier Estimation for GWL Mapping
11.3.2 Volume Estimation with GWL
11.3.3 GWL Convergence Analysis
11.4 GWL Experiments
11.4.1 GWL Mappings of Gaussian Mixture Models
11.4.1.1 GMM Energy and Gradient Computations
11.4.1.2 Experiments on Synthetic GMM Data
11.4.1.3 Experiments on GMMs of Real Data
11.4.2 GWL Mapping of Grammar Models
11.4.2.1 Learning Dependency Grammars
11.4.2.2 Energy Function of Dependency Grammar
11.4.2.3 Discretization of the Dependency Grammar Hypothesis Space
11.4.2.4 GWL Dependency Grammar Experiments and Curriculum Learning
11.5 Mapping the Landscape with Attraction-Diffusion
11.5.1 Metastability and a Macroscopic Partition
11.5.2 Introduction to Attraction-Diffusion
11.5.3 Attraction-Diffusion and the Ising Model
11.5.4 Attraction-Diffusion ELM Algorithm
11.5.5 Tuning ADELM
11.5.6 Barrier Estimation with AD
11.6 Mapping the SK Spin Glass Model with GWL and ADELM
11.7 Mapping Image Spaces with Attraction-Diffusion
11.7.1 Structure of Image Galaxies
11.7.2 Experiments
11.7.2.1 Digits 0–9 ELM in Latent Space
11.7.2.2 Ivy Texton ELM in Latent Space
11.7.2.3 Multiscale Ivy ELM in Latent Space
11.7.2.4 Cat Faces ELM in Latent Space
11.8 Exercises
References
Index