Machine Learning: Discriminative and Generative covers the main contemporary themes and tools in machine learning ranging from Bayesian probabilistic models to discriminative support-vector machines. However, unlike previous books that only discuss these rather different approaches in isolation, it bridges the two schools of thought together within a common framework, elegantly connecting their various theories and making one common big-picture. Also, this bridge brings forth new hybrid discriminative-generative tools that combine the strengths of both camps. This book serves multiple purposes as well. The framework acts as a scientific breakthrough, fusing the areas of generative and discriminative learning and will be of interest to many researchers. However, as a conceptual breakthrough, this common framework unifies many previously unrelated tools and techniques and makes them understandable to a larger portion of the public. This gives the more practical-minded engineer, student and the industrial public an easy-access and more sensible road map into the world of machine learning.
Machine Learning: Discriminative and Generative is designed for an audience composed of researchers & practitioners in industry and academia. The book is also suitable as a secondary text for graduate-level students in computer science and engineering.
Author(s): Tony Jebara
Series: The Kluwer International Series in Engineering and Computer Science
Publisher: Kluwer Academic Publishers / Springer
Year: 2004
Language: English
Pages: 220
Cover......Page 1
Title Page......Page 5
Contents......Page 7
List of Figures......Page 11
List of Tables......Page 13
Preface......Page 15
Acknowledgments......Page 19
1. Introduction......Page 21
1. Machine Learning Roots......Page 22
2. Generative Learning......Page 25
2.2 Generative Models in Perception......Page 28
3. Why a Probability of Everything?......Page 29
4. Discriminative Learning......Page 30
5. Objective......Page 32
6. Scope and Organization......Page 34
7. Online Support......Page 35
2. Generative versus Discriminative Learning......Page 37
1. Two Schools of Thought......Page 38
1.1 Generative Probabilistic Models......Page 39
1.2 Discriminative Classifiers and Regressors......Page 41
2. Generative Learning......Page 42
2.1 Bayesian Inference......Page 43
2.2 Maximum Likelihood......Page 44
2.3 The Exponential Family......Page 45
2.4 Maximum Entropy......Page 48
2.5 Expectation Maximization and Mixtures......Page 52
2.6 Graphical Models......Page 56
3. Conditional Learning......Page 62
3.1 Conditional Bayesian Inference......Page 63
3.2 Maximum Conditional Likelihood......Page 66
3.3 Logistic Regression......Page 67
4.1 Empirical Risk Minimization......Page 68
4.2 Structural Risk Minimization......Page 69
4.3 VC Dimension and Large Margins......Page 70
4.4 Support Vector Machines......Page 72
4.5 Kernel Methods......Page 75
5. Averaged Classifiers......Page 77
6. Joint Generative-Discriminative Learning......Page 78
3. Maximum Entropy Discrimination......Page 81
1. Regularization Theory and Support Vector Machines......Page 82
1.1 Solvability......Page 84
1.2 Support Vector Machines and Kernels......Page 85
2. A Distribution over Solutions......Page 86
3. Augmented Distributions......Page 89
4. Information and Geometry Interpretations......Page 92
5. Computing the Partition Function......Page 94
6. Margin Priors......Page 95
7.2 Non-Informative Bias Priors......Page 98
8. Support Vector Machines......Page 99
8.1 Single Axis SVM Optimization......Page 100
9. Generative Models......Page 101
9.1 Exponential Family Models......Page 102
9.2 Empirical Bayes Priors......Page 104
9.3 Full Covariance Gaussians......Page 106
9.4 Multinomials......Page 111
10.1 VC Dimension......Page 113
10.2 Sparsity......Page 114
10.3 PAC-Bayes Bounds......Page 115
Notes......Page 118
4. Extensions to MED......Page 119
1. Multiclass Classification......Page 120
2. Regression......Page 122
2.1 SVM Regression......Page 123
3. Feature Selection and Structure Learning......Page 125
3.1 Feature Selection in Classification......Page 126
3.2 Feature Selection in Regression......Page 130
3.3 Feature Selection in Generative Models......Page 133
4. Kernel Selection......Page 134
5. Meta-Learning......Page 137
6. Transduction......Page 140
6.1 Transductive Classification......Page 141
6.2 Transductive Regression......Page 145
7. Other Extensions......Page 149
5. Latent Discrimination......Page 151
1. Mixture Models and Latent Variables......Page 153
2. Iterative MED Projection......Page 157
3. Bounding the Latent MED Constraints......Page 158
4. Latent Decision Rules......Page 163
5. Large Margin Mixtures of Gaussians......Page 164
5.1 Parameter Distribution Update......Page 165
5.2 Just a Support Vector Machine......Page 168
5.3 Latent Distributions Update......Page 169
5.4 Extension to Kernels......Page 174
6. Efficiency......Page 175
6.1 Efficient Mixtures of Gaussians......Page 180
7. Structured Latent Models......Page 181
8. Factorization of Lagrange Multipliers......Page 186
9. Mean Field for Intractable Models......Page 188
6. Conclusion......Page 191
1. A Generative and Discriminative Hybrid......Page 192
2. Designing Models versus Designing Kernels......Page 194
3. What's Next?......Page 196
1.1 Constrained Gradient Ascent......Page 199
1.2 Axis-Parallel Optimization......Page 201
1.3 Learning Axis Transitions......Page 203
References......Page 205
Index......Page 219