Gaussian Processes for Machine Learning

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

A comprehensive and self-contained introduction to Gaussian processes, which provide a principled, practical, probabilistic approach to learning in kernel machines.

Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

Author(s): Carl Edward Rasmussen; Christopher K.I. Williams
Series: Adaptive Computation and Machine Learning
Publisher: Mit Press
Year: 2005

Language: English
Commentary: Fixed bookmarks.
Pages: 272

Series Foreword
Preface
Symbols and Notation

1. Introduction
1.1 A Pictorial Introduction to Bayesian Modelling
1.2 Roadmap

2. Regression
2.1 Weight-space View
2.1.1 The Standard Linear Model
2.1.2 Projections of Inputs into Feature Space
2.2 Function-space View
2.3 Varying the Hyperparameters
2.4 Decision Theory for Regression
2.5 An Example Application
2.6 Smoothing, Weight Functions and Equivalent Kernels
2.7 Incorporating Explicit Basis Functions
2.7.1 Marginal Likelihood
2.8 History and Related Work
2.9 Exercises

3. Classification
3.1 Classification Problems
3.1.1 Decision Theory for Classification
3.2 Linear Models for Classification
3.3 Gaussian Process Classification
3.4 The Laplace Approximation for the Binary GP Classifier
3.4.1 Posterior
3.4.2 Predictions
3.4.3 Implementation
3.4.4 Marginal Likelihood
3.5 Multi-class Laplace Approximation
3.5.1 Implementation
3.6 Expectation Propagation
3.6.1 Predictions
3.6.2 Marginal Likelihood
3.6.3 Implementation
3.7 Experiments
3.7.1 A Toy Problem
3.7.2 One-dimensional Example
3.7.3 Binary Handwritten Digit Classification Example
3.7.4 10-class Handwritten Digit Classification Example
3.8 Discussion
3.9 Appendix: Moment Derivations
3.10 Exercises

4. Covariance Functions
4.1 Preliminaries
4.1.1 Mean Square Continuity and Differentiability
4.2 Examples of Covariance Functions
4.2.1 Stationary Covariance Functions
4.2.2 Dot Product Covariance Functions
4.2.3 Other Non-stationary Covariance Functions
4.2.4 Making New Kernels from Old
4.3 Eigenfunction Analysis of Kernels
4.3.1 An Analytic Example
4.3.2 Numerical Approximation of Eigenfunctions
4.4 Kernels for Non-vectorial Inputs
4.4.1 String Kernels
4.4.2 Fisher Kernels
4.5 Exercises

5. Model Selection and Adaptation of Hyperparameters
5.1 The Model Selection Problem
5.2 Bayesian Model Selection
5.3 Cross-validation
5.4 Model Selection for GP Regression
5.4.1 Marginal Likelihood
5.4.2 Cross-validation
5.4.3 Examples and Discussion
5.5 Model Selection for GP Classification
5.5.1 Derivatives of the Marginal Likelihood for Laplace’s Approximation
5.5.2 Derivatives of the Marginal Likelihood for EP
5.5.3 Cross-validation
5.5.4 Example
5.6 Exercises

6. Relationships between GPs and Other Models
6.1 Reproducing Kernel Hilbert Spaces
6.2 Regularization
6.2.1 Regularization Defined by Differential Operators
6.2.2 Obtaining the Regularized Solution
6.2.3 The Relationship of the Regularization View to Gaussian Process Prediction
6.3 Spline Models
6.3.1 A 1-d Gaussian Process Spline Construction
6.4 Support Vector Machines
6.4.1 Support Vector Classification
6.4.2 Support Vector Regression
6.5 Least-squares Classification
6.5.1 Probabilistic Least-squares Classification
6.6 Relevance Vector Machines
6.7 Exercises

7. Theoretical Perspectives
7.1 The Equivalent Kernel
7.1.1 Some Specific Examples of Equivalent Kernels
7.2 Asymptotic Analysis
7.2.1 Consistency
7.2.2 Equivalence and Orthogonality
7.3 Average-case Learning Curves
7.4 PAC-Bayesian Analysis
7.4.1 The PAC Framework
7.4.2 PAC-Bayesian Analysis
7.4.3 PAC-Bayesian Analysis of GP Classification
7.5 Comparison with Other Supervised Learning Methods
7.6 Appendix: Learning Curve for the Ornstein-Uhlenbeck Process
7.7 Exercises

8. Approximation Methods for Large Datasets
8.1 Reduced-rank Approximations of the Gram Matrix
8.2 Greedy Approximation
8.3 Approximations for GPR with Fixed Hyperparameters
8.3.1 Subset of Regressors
8.3.2 The Nystr¨om Method
8.3.3 Subset of Datapoints
8.3.4 Projected Process Approximation
8.3.5 Bayesian Committee Machine
8.3.6 Iterative Solution of Linear Systems
8.3.7 Comparison of Approximate GPR Methods
8.4 Approximations for GPC with Fixed Hyperparameters
8.5 Approximating the Marginal Likelihood and its Derivatives
8.6 Appendix: Equivalence of SR and GPR Using the Nystr¨om Approximate Kernel
8.7 Exercises

9. Further Issues and Conclusions
9.1 Multiple Outputs
9.2 Noise Models with Dependencies
9.3 Non-Gaussian Likelihoods
9.4 Derivative Observations
9.5 Prediction with Uncertain Inputs
9.6 Mixtures of Gaussian Processes
9.7 Global Optimization
9.8 Evaluation of Integrals
9.9 Student’s t Process
9.10 Invariances
9.11 Latent Variable Models
9.12 Conclusions and Future Directions

Appendix A. Mathematical Background
A.1 Joint, Marginal and Conditional Probability
A.2 Gaussian Identities
A.3 Matrix Identities
A.3.1 Matrix Derivatives
A.3.2 Matrix Norms
A.4 Cholesky Decomposition
A.5 Entropy and Kullback-Leibler Divergence
A.6 Limits
A.7 Measure and Integration
A.7.1 Lp Spaces
A.8 Fourier Transforms
A.9 Convexity

Appendix B. Gaussian Markov Processes
B.1 Fourier Analysis
B.1.1 Sampling and Periodization
B.2 Continuous-time Gaussian Markov Processes
B.2.1 Continuous-time GMPs on R
B.2.2 The Solution of the Corresponding SDE on the Circle
B.3 Discrete-time Gaussian Markov Processes
B.3.1 Discrete-time GMPs on Z
B.3.2 The Solution of the Corresponding Difference Equation on PN
B.4 The Relationship Between Discrete-time and Sampled Continuous-time GMPs
B.5 Markov Processes in Higher Dimensions

Appendix C. Datasets and Code

Bibliography
Author Index
Subject Index