Author(s): Vladimir N. Vapnik
Publisher: Springer
Year: 1995
Cover
Title page
Preface
Introduction: Four Periods in the Research of the Learning Problem
0.1 Rosenblatt's Perceptron (The 1960s)
0.2 Construction of the Fundamentals of Learning Theory (The 1960-19708)
0.3 Neural Networks (The 1980s)
0.4 Returning to the Origin (The 1990s)
Chapter 1 Setting of the Learning Problem
1.1 Function Estimation Model
1.2 The Problem of Risk Minimization
1.3 Three Main Learning Problems
1.3.1 Pattern Recognition
1.3.2 Regression Estimation
1.3.3 Density Estimation (Fisher-Wald Setting)
1.4 The General Setting of the Learning Problem
1.5 The Empirical Risk Minimization (ERM) Inductive Principle
1.6 The Four Parts of Learning Theory
Informal Reasoning and Comments - 1
1.7 The Classical Paradigm of Solving Learning Problems
1.7.1 Density Estimation Problem (Maximum Likelihood Method)
1.7.2 Pattern Recognition (Discriminant Analysis) Problem
1.7.3 Regression Estimation Model
1.7.4 Narrowness of the ML Method
1.8 Nonparametric Methods of Density Estimation
1.8.1 Parzen's Windows
1.8.2 The Problem of Density Estimation is Ill-Posed
1.9 Main Principle for Solving Problems Using a Restricted Amount of Information
1.10 Model Minimization of the Risk Based on Empirical Data
1.10.1 Pattern Recognition
1.10.2 Regression Estimation
1.10.3 Density Estimation
1.11 Stochastic Approximation Inference
Chapter 2 Consistency of Learning Processes
2.1 The Classical Definition of Consistency and the Concept of Nontrivial Consistency
2.2 The Key Theorem of Learning Theory
2.2.1 Remark on the ML Method
2.3 Necessary and Sufficient Conditions for Uniform Two-Sided Convergence
2.3.1 Remark on Law of Large Numbers and its Generalization
2.3.2 Entropy of the Set of Indicator Functions
2.3.3 Entropy of the Set of Real Functions
2.3.4 Conditions for Uniform Two-Sided Convergence
2.4 Necessary and Sufficient Conditions for Uniform One-Sided Convergence
2.5 Theory of Nonfalsifiability
2.5.1 Kant's Problem of Demarcation and Popper's Theory of Nonfalsifiability
2.6 Theorems about Nonfalsifiability
2.6.1 Case of Complete (Popper's) Nonfalsifiability
2.6.2 Theorem about Partial Nonfalsifiability
2.6.3 Theorem about Potential Nonfalsifiability
2.7 Three Milestones in Learning Theory
Informal Reasoning and Comments - 2
2.8 The Basic Problems of Probability Theory and Statistics
2.8.1 Axioms of Probability Theory
2.9 Two Modes of Estimating a Probability Measure
2.10 Strong Mode Estimation of Probability Measures and the Density Estimation Problem
2.11 The Glivenko-Cantelli Theorem and its Generalization
2.12 Mathematical Theory of Induction
Chapter 3 Bounds on the Rate of Convergence of Learning Processes
3.1 The Basic Inequalities
3.2 Generalization for the Set of Real Functions
3.3 The Main Distribution Independent Bounds
3.4 Bounds on the Generalization Ability of Learning Machines
3.5 The Structure of the Growth Function
3.6 The VC Dimension of a Set of Functions
3.7 Constructive Distribution-Independent Bounds
3.8 The Problem of Constructing Rigorous (Distribution-Dependent) Bounds
Informal Reasoning and Comments - 3
3.9 Kolmogorov-Smirnov Distributions
3.10 Racing for the Constant
3.11 Bounds on Empirical Processes
Chapter 4 Controlling the Generalization Ability of Learning Processes
4.1 Structural Risk Minimization (SRM) Inductive Principle
4.2 Asymptotic Analysis of the Rate of Convergence
4.3 The Problem of Function Approximation in Learning Theory
4.4 Examples of Structures for Neural Nets
4.5 The Problem of Local Function Estimation
4.6 The Minimum Description Length (MDL) and SRM Principles
4.6.1 The MDL Principle
4.6.2 Bounds for the MDL Principle
4.6.3 The SRM and the MDL Principles
4.6.4 A Weak Point of the MDL Principle
Informal Reasoning and Comments - 4
4.7 Methods for Solving Ill-Posed Problems
4.8 Stochastic Ill-Posed Problems and the Problem of Density Estimation
4.9 The Problem of Polynomial Approximation of the Regression
4.10 The Problem of Capacity Control
4.10.1 Choosing the Degree of the Polynomial
4.10.2 Choosing the Best Sparse Algebraic Polynomial
4.10.3 Structures on the Set of Trigonometric Polynomials
4.10.4 The Problem of Features Selection
4.11 The Problem of Capacity Control and Bayesian Inference
4.11.1 The Bayesian Approach in Learning Theory
4.11.2 Discussion of the Bayesian Approach and Capacity Control Methods
Chapter 5 Constructing Learning Algorithms
5.1 Why can Learning Machines Generalize?
5.2 Sigmoid Approximation of Indicator Functions
5.3 Neural N etworks
5.3.1 The Back-Propagation Method
5.3.2 The Back-Propagation Aigorithm
5.3.3 Neural Networks for the Regression Estimation Problem
5.3.4 Remarks on the Back-Propagation Method
5.4 The Optimal Separating Hyperplane
5.4.1 The Optimal Hyperplane
5.4.2 The Structure of Canonical Hyperplanes
5.5 Constructing the Optimal Hyperplane
5.5.1 Generalization for the Nonseparable Case
5.6 Support Vector (SV) Machines
5.6.1 Generalization in High-Dimensional Space
5.6.2 Convolution of the Inner Product
5.6.3 Constructing SV Machines
5.6.4 Examples of SV Machines
5.7 Experiments with SV Machines
5.7.1 Example in the Plane
5.7.2 Handwritten Digit Recognition
5.7.3 Some Important Details
5.8 Remarks on SV Machines
5.9. SV Machines for the Regression Estimation Problem
5.9.1 ε-Insensitive Loss-Function
5.9.2 Minimizing the Risk Using Convex Optimization Procedure
5.9.3 SV Machine with Convolved Inner Product
Informal Reasoning and Comments - 5
5.10 The Art of Engineering Versus Formal Inference
5.11 Wisdom of Statistical Models
5.12 What Can One Learn from Digit Recognition Experiments?
5.12.1 Influence of the Type of Structures and Accuracy of Capacity Control
5.12.2 SRM Principle and the Problem of Feature Construction
5.12.3 Is the Set of Support Vectors a Robust Characteristic of the Data?
Conclusion: What is Important in Learning Theory?
6.1 What is Important in the Setting of the Problem?
6.2 What is Important in the Theory of Consistency of Learning Processes?
6.3 What is Important in the Theory of Bounds?
6.4 What is Important in the Theory for Controlling the Generalization Ability of Learning Machines?
6.5 What is Important in the Theory for Constructing Learning Algorithms?
6.6 What is the Most Important?
References
Remarks on References
References
Index