The fundamental mathematical tools needed to understand machine learning include linear algebra, analytic geometry, matrix decompositions, vector calculus, optimization, probability and statistics. These topics are traditionally taught in disparate courses, making it hard for data science or computer science students, or professionals, to efficiently learn the mathematics. This self contained textbook bridges the gap between mathematical and machine learning texts, introducing the mathematical concepts with a minimum of prerequisites. It uses these concepts to derive four central machine learning methods: linear regression, principal component analysis, Gaussian mixture models and support vector machines. For students and others with a mathematical background, these derivations provide a starting point to machine learning texts. For those learning the mathematics for the first time, the methods help build intuition and practical experience with applying mathematical concepts. Every chapter includes worked examples and exercises to test understanding. Programming tutorials are offered on the book's web site.
Author(s): Marc Peter Deisenroth, A. Aldo Faisal, Cheng Soon Ong
Edition: 1
Publisher: Cambridge University Press
Year: 2020
Language: English
Commentary: Publisher's PDF
Pages: 398
City: Cambridge, UK
Tags: Machine Learning; Classification; Principal Component Analysis; Support Vector Machines; Optimization; Linear Regression; Linear Algebra; Mathematics; Probability Theory; Dimensionality Reduction; Analytic Geometry; Vector Calculus
Foreword
Part I Mathematical Foundations
1 Introduction and Motivation
1.1 Finding Words for Intuitions
1.2 Two Ways to Read This Book
1.3 Exercises and Feedback
2 Linear Algebra
2.1 Systems of Linear Equations
2.2 Matrices
2.3 Solving Systems of Linear Equations
2.4 Vector Spaces
2.5 Linear Independence
2.6 Basis and Rank
2.7 Linear Mappings
2.8 Affine Spaces
2.9 Further Reading
Exercises
3 Analytic Geometry
3.1 Norms
3.2 Inner Products
3.3 Lengths and Distances
3.4 Angles and Orthogonality
3.5 Orthonormal Basis
3.6 Orthogonal Complement
3.7 Inner Product of Functions
3.8 Orthogonal Projections
3.9 Rotations
3.10 Further Reading
Exercises
4 Matrix Decompositions
4.1 Determinant and Trace
4.2 Eigenvalues and Eigenvectors
4.3 Cholesky Decomposition
4.4 Eigendecomposition and Diagonalization
4.5 Singular Value Decomposition
4.6 Matrix Approximation
4.7 Matrix Phylogeny
4.8 Further Reading
Exercises
5 Vector Calculus
5.1 Differentiation of Univariate Functions
5.2 Partial Differentiation and Gradients
5.3 Gradients of Vector-Valued Functions
5.4 Gradients of Matrices
5.5 Useful Identities for Computing Gradients
5.6 Backpropagation and Automatic Differentiation
5.7 Higher-Order Derivatives
5.8 Linearization and Multivariate Taylor Series
5.9 Further Reading
Exercises
6 Probability and Distributions
6.1 Construction of a Probability Space
6.2 Discrete and Continuous Probabilities
6.3 Sum Rule, Product Rule, and Bayes' Theorem
6.4 Summary Statistics and Independence
6.5 Gaussian Distribution
6.6 Conjugacy and the Exponential Family
6.7 Change of Variables/Inverse Transform
6.8 Further Reading
Exercises
7 Continuous Optimization
7.1 Optimization Using Gradient Descent
7.2 Constrained Optimization and Lagrange Multipliers
7.3 Convex Optimization
7.4 Further Reading
Exercises
Part II Central Machine Learning Problems
8 When Models Meet Data
8.1 Data, Models, and Learning
8.2 Empirical Risk Minimization
8.3 Parameter Estimation
8.4 Probabilistic Modeling and Inference
8.5 Directed Graphical Models
8.6 Model Selection
9 Linear Regression
9.1 Problem Formulation
9.2 Parameter Estimation
9.3 Bayesian Linear Regression
9.4 Maximum Likelihood as Orthogonal Projection
9.5 Further Reading
10 Dimensionality Reduction with Principal Component Analysis
10.1 Problem Setting
10.2 Maximum Variance Perspective
10.3 Projection Perspective
10.4 Eigenvector Computation and Low-Rank Approximations
10.5 PCA in High Dimensions
10.6 Key Steps of PCA in Practice
10.7 Latent Variable Perspective
10.8 Further Reading
11 Density Estimation with Gaussian Mixture Models
11.1 Gaussian Mixture Model
11.2 Parameter Learning via Maximum Likelihood
11.3 EM Algorithm
11.4 Latent-Variable Perspective
11.5 Further Reading
12 Classification with Support Vector Machines
12.1 Separating Hyperplanes
12.2 Primal Support Vector Machine
12.3 Dual Support Vector Machine
12.4 Kernels
12.5 Numerical Solution
12.6 Further Reading
References
Index