As the first book of a three-part series, this book is offered as a tribute to pioneers in vision, such as Béla Julesz, David Marr, King-Sun Fu, Ulf Grenander, and David Mumford. The authors hope to provide foundation and, perhaps more importantly, further inspiration for continued research in vision. This book covers David Marr's paradigm and various underlying statistical models for vision. The mathematical framework herein integrates three regimes of models (low-, mid-, and high-entropy regimes) and provides foundation for research in visual coding, recognition, and cognition. Concepts are first explained for understanding and then supported by findings in psychology and neuroscience, after which they are established by statistical models and associated learning and inference algorithms. A reader will gain a unified, cross-disciplinary view of research in vision and will accrue knowledge spanning from psychology to neuroscience to statistics.
Author(s): Song-Chun Zhu; Ying Nian Wu
Publisher: Springer
Year: 2023
Language: English
Pages: 364
Preface
Story of David Marr
Beyond David Marr's Paradigm
Introducing the Book Series
Contents
About the Authors
1 Introduction
1.1 Goal of Vision
1.2 Seeing as Bayesian Inference
1.3 Knowledge Representation
1.4 Pursuit of Probabilistic Models
2 Statistics of Natural Images
2.1 Image Space and Distribution
2.2 Information and Encoding
2.3 Image Statistics and Power Law
2.4 Kurtosis and Sparsity
2.5 Scale Invariance
3 Textures
3.1 Julesz Quest
3.2 Markov Random Fields
Markov Random Field (MRF)
Ising and Potts Models
Gaussian Markov Random Field (GMRF)
Advanced Models: Hierarchical MRF and Mumford–Shah Model
Selecting Filters and Learning Potential Functions
3.3 Filters for Early Vision
Correlation and Convolution
Edge Detection Filters
Gaussian Filters
Derivative of Gaussian and Laplacian of Gaussian Filters
Gabor Filters
3.4 FRAME Model
Intuition and the Big Picture
Deriving the FRAME Model
Learning Potential Functions
Filter Selection
3.5 Texture Ensemble
Ensembles in Statistical Physics
Texture Ensemble
Type Theory and Entropy Rate Functions
A Simple Independent Model
From FRAME Model to Julesz Ensemble on Infinite Lattice
From Julesz Ensemble to FRAME Model on Finite Lattice
Equivalence of FRAME and Julesz Ensemble
From Julesz Ensemble to FRAME Model
From FRAME Model to Julesz Ensemble
3.6 Reaction and Diffusion Equations
Turing Diffusion-Reaction
Heat Diffusion
Anisotropic Diffusion
GRADE: Gibbs Reaction and Diffusion Equations
Properties of GRADE
Property 1: A General Statistical Framework
Property 2: Diffusion
Property 3: Reaction
3.7 Conclusion
4 Textons
4.1 Textons and Textures
Julesz's Discovery
Neural Coding Schemes
4.2 Sparse Coding
Image Representation
Basis and Frame
Olshausen–Field Model
A Three-Level Generative Model
4.3 Active Basis Model
Olshausen–Field Model for Sparse Coding
Active Basis Model for Shared Sparse Coding of Aligned Image Patches
Prototype Algorithm
Statistical Modeling
Shared Matching Pursuit
4.4 Sparse FRAME Model
Dense FRAME
Sparse Representation
Maximum Likelihood Learning
Generative Boosting
Sparse Model
4.5 Compositional Sparse Coding
Sparsity and Composition
Compositional Sparse Coding Model
5 Gestalt Laws and Perceptual Organization
5.1 Gestalt Laws for Perceptual Organization
5.2 Texton Process Embedding Gestalt Laws
Introduction
Background on Descriptive and Generative Learning
A Multi-layered Generative Model for Images
A Descriptive Model of Texton Processes
Background: Physics Foundation for Visual Modeling
Gestalt Ensemble
An Integrated Learning Framework
Integrated Learning
Mathematical Definitions of Visual Patterns
Effective Inference by Simplified Likelihood
Initialization by Likelihood Simplification and Clustering
Experiment I: Texton Clustering
Experiment II: Integrated Learning and Synthesis
Discussion
6 Primal Sketch: Integrating Textures and Textons
6.1 Marr's Conjecture on Primal Sketch
6.2 The Two-Layer Model
Structure Domain
The Dictionary of Image Primitives
Texture Domain
Integrated Model
The Sketch Pursuit Algorithm
6.3 Hybrid Image Templates
Representation
Prototypes, ε-Balls, and Saturation Function
Projecting Image Patches to 1D Responses
Template Pursuit by Information Projection
Example: Vector Fields for Human Hair Analysis and Synthesis
6.4 HoG and SIFT Representations
7 2.1D Sketch and Layered Representation
7.1 Problem Formulation
7.2 Variational Formulation by Nitzberg and Mumford
The Energy Functional
The Euler Elastica for Completing Occluded Curves
7.3 Mixed Markov Random Field Formulation
Definition of W2D and W2.1D
The Mixed MRF and Its Graphical Representation
Bayesian Formulation
7.4 2.1D Sketch with Layered Regions and Curves
Generative Models and Bayesian Formulation
Generative Models of Curves
Generative Models of Regions
Bayesian Formulation for Probabilistic Inference
Experiments
Experiment A: Computing Regions and Free Curves
8 2.5D Sketch and Depth Maps
8.1 Marr's Definition
8.2 Shape from Stereo
The Image Formation Model
Two-Layer Representation
The Inference Algorithm
Example Results
8.3 Shape from Shading
Overview of Two-Layer Generation Model
Results
9 Learning by Information Projection
9.1 Information Projection
Orthogonality and Duality
Maximum Likelihood Implementation
9.2 Minimax Learning Framework
Model Pursuit Strategies
2D Toy Example
Learning Shape Patterns
Relation to Discriminative Learning
10 Information Scaling
10.1 Image Scaling
Model and Assumptions
Image Formation and Scaling
Empirical Observations on Information Scaling
Change of Compression Rate
Variance Normalization
Basic Information Theoretical Concepts
Change of Entropy Rate
10.2 Perceptual Entropy
A Continuous Spectrum
10.3 Perceptual Scale Space
10.4 Energy Landscape
11 Deep Image Models
11.1 Deep FRAME and Deep Energy-Based Model
ConvNet Filters
FRAME with ConvNet Filters
Learning and Sampling
Learning a New Layer of Filters
Deep Convolutional Energy-Based Model
Hopfield Auto-Encoder
Multi-grid Sampling and Modeling
Adversarial Interpretation
11.2 Generator Network
Factor Analysis
Nonlinear Factor Analysis
Learning by Alternating Back-Propagation
Nonlinear Generalization of AAM Model
Dynamic Generator Model
12 A Tale of Three Families: Discriminative, Descriptive, and Generative Models
12.1 Introduction
Three Families of Probabilistic Models
Supervised, Unsupervised, and Self-supervised Learning
MCMC for Synthesis and Inference
Deep Networks as Function Approximators
Learned Computation
Amortized Computation for Synthesis and InferenceSampling
Distributed Representation and Embedding
Perturbations of Kullback–Leibler Divergence
Kullback–Leibler Divergence in Two Directions
12.2 Descriptive Energy-Based Model
Model and Origin
Gradient-Based Sampling
Maximum Likelihood Estimation (MLE)
Objective Function and Estimating Equation of MLE
Perturbation of KL-divergence
Self-adversarial Interpretation
Short-Run MCMC for Synthesis
Objective Function and Estimating Equation with Short-Run MCMC
Flow-Based Model
Flow-Based Reference and Latent Space Sampling
Diffusion Recovery Likelihood
Diffusion-Based Model
12.3 Equivalence Between Discriminative and DescriptiveModels
Discriminative Model
Descriptive Model as Exponential Tilting of a Reference Distribution
Discriminative Model via Bayes Rule
Noise Contrastive Estimation
Flow Contrastive Estimation
12.4 Generative Latent Variable Model
Model and Origin
Generative Model with Multi-layer Latent Variables
MLE Learning and Posterior Inference
Posterior Sampling
Perturbation of KL-divergence
Short-Run MCMC for Approximate Inference
Objective Function and Estimating Equation
12.5 Descriptive Model in Latent Space of Generative Model
Top-Down and Bottom-Up
Descriptive Energy-Based Model in Latent Space
Maximum Likelihood Learning
Short-Run MCMC for Synthesis and Inference
Divergence Perturbation
12.6 Variational and Adversarial Learning
From Short-Run MCMC to Learned Sampling Computations
VAE: Learned Computation for Inference Sampling
GAN: Joint Learning of Generator and Discriminator
Joint Learning of Descriptive and Generative Models
Divergence Triangle: Integrating VAE and ACD
12.7 Cooperative Learning via MCMC Teaching
Joint Training of Descriptive and Generative Models
Conditional Learning via Fast Thinking Initializer and Slow Thinking Solver
Bibliography