Linear Algebra And Optimization With Applications To Machine Learning - Volume II: Fundamentals of Optimization Theory with Applications to Machine Learning

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Volume 2 applies the linear algebra concepts presented in Volume 1 to optimization problems which frequently occur throughout Machine Learning. This book blends theory with practice by not only carefully discussing the mathematical under pinnings of each optimization technique but by applying these techniques to linear programming, support vector machines (SVM), principal component analysis (PCA), and ridge regression. Volume 2 begins by discussing preliminary concepts of optimization theory such as metric spaces, derivatives, and the Lagrange multiplier technique for finding extrema of real valued functions. The focus then shifts to the special case of optimizing a linear function over a region determined by affine constraints, namely linear programming. Highlights include careful derivations and applications of the simplex algorithm, the dual-simplex algorithm, and the primal-dual algorithm. The theoretical heart of this book is the mathematically rigorous presentation of various nonlinear optimization methods, including but not limited to gradient decent, the Karush-Kuhn-Tucker (KKT) conditions, Lagrangian duality, alternating direction method of multipliers (ADMM), and the kernel method. These methods are carefully applied to hard margin SVM, soft margin SVM, kernel PCA, ridge regression, lasso regression, and elastic-net regression. Matlab programs implementing these methods are included. Many books on machine learning struggle with the above problem. How can one understand what are the dual variables of a ridge regression problem if one doesn’t know about the Lagrangian duality framework? Similarly, how is it possible to discuss the dual formulation of SVM without a firm understanding of the Lagrangian framework? The easy way out is to sweep these difficulties under the rug. If one is just a consumer of the techniques we mentioned above, the cookbook recipe approach is probably adequate. But this approach doesn’t work for someone who really wants to do serious research and make significant contributions. To do so, we believe that one must have a solid background in linear algebra and optimization theory.

Author(s): Jean Gallier, Jocelyn Quaintance
Publisher: World Scientific Publishing
Year: 2021

Language: English
Pages: 896

Contents
Preface
1. Introduction
Preliminaries for Optimization Theory
2. Topology
2.1 Metric Spaces and Normed Vector Spaces
2.2 Topological Spaces
2.3 Subspace and Product Topologies
2.4 Continuous Functions
2.5 Limits and Continuity; Uniform Continuity
2.6 Continuous Linear and Multilinear Maps
2.7 Complete Metric Spaces and Banach Spaces
2.8 Completion of a Metric Space
2.9 Completion of a Normed Vector Space
2.10 The Contraction Mapping Theorem
2.11 Further Readings
2.12 Summary
2.13 Problems
3. Differential Calculus
3.1 Directional Derivatives, Total Derivatives
3.2 Properties of Derivatives
3.3 Jacobian Matrices
3.4 The Implicit and the Inverse Function Theorems
3.5 Second-Order and Higher-Order Derivatives
3.6 Taylor's Formula, Faà di Bruno's Formula
3.7 Further Readings
3.8 Summary
3.9 Problems
4. Extrema of Real-Valued Functions
4.1 Local Extrema, Constrained Local Extrema, and Lagrange Multipliers
4.2 Using Second Derivatives to Find Extrema
4.3 Using Convexity to Find Extrema
4.4 Summary
4.5 Problems
5. Newton's Method and Its Generalizations
5.1 Newton's Method for Real Functions of a Real Argument
5.2 Generalizations of Newton's Method
5.3 Summary
5.4 Problems
6. Quadratic Optimization Problems
6.1 Quadratic Optimization: The Positive Definite Case
6.2 Quadratic Optimization: The General Case
6.3 Maximizing a Quadratic Function on the Unit Sphere
6.4 Summary
6.5 Problems
7. Schur Complements and Applications
7.1 Schur Complements
7.2 Symmetric Positive Definite Matrices and Schur Complements
7.3 Symmetric Positive Semidefinite Matrices and Schur Complements
7.4 Summary
7.5 Problems
Linear Optimization
8. Convex Sets, Cones, H-Polyhedra
8.1 What is Linear Programming?
8.2 Affine Subsets, Convex Sets, Affine Hyperplanes, Half-Spaces
8.3 Cones, Polyhedral Cones, and H-Polyhedra
8.4 Summary
8.5 Problems
9. Linear Programs
9.1 Linear Programs, Feasible Solutions, Optimal Solutions
9.2 Basic Feasible Solutions and Vertices
9.3 Summary
9.4 Problems
10. The Simplex Algorithm
10.1 The Idea Behind the Simplex Algorithm
10.2 The Simplex Algorithm in General
10.3 How to Perform a Pivoting Step Efficiently
10.4 The Simplex Algorithm Using Tableaux
10.5 Computational Efficiency of the Simplex Method
10.6 Summary
10.7 Problems
11. Linear Programming and Duality
11.1 Variants of the Farkas Lemma
11.2 The Duality Theorem in Linear Programming
11.3 Complementary Slackness Conditions
11.4 Duality for Linear Programs in Standard Form
11.5 The Dual Simplex Algorithm
11.6 The Primal-Dual Algorithm
11.7 Summary
11.8 Problems
NonLinear Optimization
12. Basics of Hilbert Spaces
12.1 The Projection Lemma
12.2 Duality and the Riesz Representation Theorem
12.3 Farkas–Minkowski Lemma in Hilbert Spaces
12.4 Summary
12.5 Problems
13. General Results of Optimization Theory
13.1 Optimization Problems; Basic Terminology
13.2 Existence of Solutions of an Optimization Problem
13.3 Minima of Quadratic Functionals
13.4 Elliptic Functionals
13.5 Iterative Methods for Unconstrained Problems
13.6 Gradient Descent Methods for Unconstrained Problems
13.7 Convergence of Gradient Descent with Variable Stepsize
13.8 Steepest Descent for an Arbitrary Norm
13.9 Newton's Method for Finding a Minimum
13.10 Conjugate Gradient Methods for Unconstrained Problems
13.11 Gradient Projection Methods for Constrained Optimization
13.12 Penalty Methods for Constrained Optimization
13.13 Summary
13.14 Problems
14. Introduction to Nonlinear Optimization
14.1 The Cone of Feasible Directions
14.2 Active Constraints and Qualified Constraints
14.3 The Karush–Kuhn–Tucker Conditions
14.4 Equality Constrained Minimization
14.5 Hard Margin Support Vector Machine; Version I
14.6 Hard Margin Support Vector Machine; Version II
14.7 Lagrangian Duality and Saddle Points
14.8 Weak and Strong Duality
14.9 Handling Equality Constraints Explicitly
14.11 Conjugate Function and Legendre Dual Function
14.12 Some Techniques to Obtain a More Useful Dual Program
14.13 Uzawa's Method
14.14 Summary
14.15 Problems
15. Subgradients and Subdifferentials of Convex Functions
15.1 Extended Real-Valued Convex Functions
15.2 Subgradients and Subdifferentials
15.3 Basic Properties of Subgradients and Subdifferentials
15.4 Additional Properties of Subdifferentials
15.5 The Minimum of a Proper Convex Function
15.6 Generalization of the Lagrangian Framework
15.7 Summary
15.8 Problems
16. Dual Ascent Methods; ADMM
16.1 Dual Ascent
16.2 Augmented Lagrangians and the Method of Multipliers
16.3 ADMM: Alternating Direction Method of Multipliers
16.4 Convergence of ADMM
16.5 Stopping Criteria
16.6 Some Applications of ADMM
16.7 Solving Hard Margin (SVMh2) Using ADMM
16.8 Applications of ADMM to ℓ1-Norm Problems
16.9 Summary
16.10 Problems
Applications to Machine Learning
17. Positive Definite Kernels
17.1 Feature Maps and Kernel Functions
17.2 Basic Properties of Positive Definite Kernels
17.3 Hilbert Space Representation of a Positive Definite Kernel
17.4 Kernel PCA
17.5 Summary
17.6 Problems
18. Soft Margin Support Vector Machines
18.1 Soft Margin Support Vector Machines; (SVMs1)
18.2 Solving SVM (SVMs1) Using ADMM
18.3 Soft Margin Support Vector Machines; (SVMs2)
18.4 Solving SVM (SVMs2) Using ADMM
18.5 Soft Margin Support Vector Machines; (SVMs2′)
18.6 Classification of the Data Points in Terms of ν (SVMs2′)
18.7 Existence of Support Vectors for (SVMs2′)
18.8 Solving SVM (SVMs2′) Using ADMM
18.9 Soft Margin Support Vector Machines; (SVMs3)
18.10 Classification of the Data Points in Terms of ν (SVMs3)
18.11 Existence of Support Vectors for (SVMs3)
18.12 Solving SVM (SVMs3) Using ADMM
18.13 Soft Margin SVM; (SVMs4)
18.14 Solving SVM (SVMs4) Using ADMM
18.15 Soft Margin SVM; (SVMs5)
18.16 Solving SVM (SVMs5) Using ADMM
18.17 Summary and Comparison of the SVM Methods
18.18 Problems
19. Ridge Regression, Lasso, Elastic Net
19.1 Ridge Regression
19.3 Kernel Ridge Regression
19.4 Lasso Regression (ℓ1-Regularized Regression)
19.5 Lasso Regression; Learning an Affine Function
19.6 Elastic Net Regression
19.7 Summary
19.8 Problems
20. ν-SV Regression
20.1 ν-SV Regression; Derivation of the Dual
20.2 Existence of Support Vectors
20.4 Kernel ν-SV Regression
20.5 ν-Regression Version 2; Penalizing b
20.6 Summary
20.7 Problems
Appendix A Total Orthogonal Families in Hilbert Spaces
A.1 Total Orthogonal Families (Hilbert Bases), Fourier Coefficients
A.2 The Hilbert Space ℓ2(K) and the Riesz–Fischer Theorem
A.3 Summary
A.4 Problems
Appendix B Matlab Programs
B.1 Hard Margin (SVMh2)
B.2 Soft Margin SVM (SVMs20)
B.3 Soft Margin SVM (SVMs3)
B.4 ν-SV Regression
Bibliography
Index