This book describes in detail the fundamental mathematics and algorithms of machine learning (an example of artificial intelligence) and signal processing, two of the most important and exciting technologies in the modern information economy. Taking a gradual approach, it builds up concepts in a solid, step-by-step fashion so that the ideas and algorithms can be implemented in practical software applications.
Digital signal processing (DSP) is one of the 'foundational' engineering topics of the modern world, without which technologies such the mobile phone, television, CD and MP3 players, WiFi and radar, would not be possible. A relative newcomer by comparison, statistical machine learning is the theoretical backbone of exciting technologies such as automatic techniques for car registration plate recognition, speech recognition, stock market prediction, defect detection on assembly lines, robot guidance, and autonomous car navigation. Statistical machine learning exploits the analogy between intelligent information processing in biological brains and sophisticated statistical modelling and inference.
DSP and statistical machine learning are of such wide importance to the knowledge economy that both have undergone rapid changes and seen radical improvements in scope and applicability. Both make use of key topics in applied mathematics such as probability and statistics, algebra, calculus, graphs and networks. Intimate formal links between the two subjects exist and because of this many overlaps exist between the two subjects that can be exploited to produce new DSP tools of surprising utility, highly suited to the contemporary world of pervasive digital sensors and high-powered, yet cheap, computing hardware. This book gives a solid mathematical foundation to, and details the key concepts and algorithms in this important topic.
Author(s): Max A. Little
Publisher: Oxford University Press, USA
Year: 2019
Language: English
Pages: xviii+359
Cover
Machine Learning for Signal Processing: Data Science, Algorithms, and Computational Statistics
Copyright
Preface
Contents
List of Algorithms
List of Figures
Chapter 1: Mathematical foundations
1.1 Abstract algebras
Groups
Rings
1.2 Metrics
1.3 Vector spaces
Linear operators
Matrix algebra
Square and invertible matrices
Eigenvalues and eigenvectors
Special matrices
1.4 Probability and stochastic processes
Sample spaces, events, measures and distributions
Joint random variables: independence, conditionals, and marginals
Bayes' rule
Expectation, generating functions and characteristic functions
Empirical distribution function and sample expectations
Transforming random variables
Multivariate Gaussian and other limiting distributions
Stochastic processes
Markov chains
1.5 Data compression and information theory
The importance of the information map
Mutual information and Kullback-Leibler (K-L) divergence
1.6 Graphs
Special graphs
1.7 Convexity
1.8 Computational complexity
Complexity order classes and big-O notation
Tractable versus intractable problems: NP-completeness
Chapter 2: Optimization
2.1 Preliminaries
Continuous differentiable problems and critical points
Continuous optimization under equality constraints: Lagrange multipliers
Inequality constraints: duality and the Karush-Kuhn-Tucker conditions
Convergence and convergence rates for iterative methods
Non-differentiable continuous problems
Discrete (combinatorial) optimization problems
2.2 Analytical methods for continuous convex problems
L2-norm objective functions
Mixed L2-L1 norm objective functions
2.3 Numerical methods for continuous convex problems
Iteratively reweighted least squares (IRLS)
Gradient descent
Adapting the step sizes: line search
Newton's method
Other gradient descent methods
2.4 Non-differentiable continuous convex problems
Linear programming
Quadratic programming
Subgradient methods
Primal-dual interior-point methods
Path-following methods
2.5 Continuous non-convex problems
2.6 Heuristics for discrete (combinatorial) optimization
Greedy search
(Simple) tabu search
Simulated annealing
Random restarting
Chapter 3: Random sampling
3.1 Generating (uniform) random numbers
3.2 Sampling from continuous distributions
Quantile function (inverse CDF) and inverse transform sampling
Random variable transformation methods
Rejection sampling
Adaptive rejection sampling (ARS) for log-concave densities
Special methods for particular distributions
3.3 Sampling from discrete distributions
Inverse transform sampling by sequential search
Rejection sampling for discrete variables
Binary search inversion for (large) finite sample spaces
3.4 Sampling from general multivariate distributions
Ancestral sampling
Gibbs sampling
Metropolis-Hastings
Other MCMC methods
Chapter 4: Statistical modelling and inference
4.1 Statistical models
Parametric versus nonparametric models
Bayesian and non-Bayesian models
4.2 Optimal probability inferences
Maximum likelihood and minimum K-L divergence
Loss functions and empirical risk estimation
Maximum a-posteriori and regularization
Regularization, model complexity and data compression
Cross-validation and regularization
The bootstrap
4.3 Bayesian inference
4.4 Distributions associated with metrics and norms
Least squares
Least Lq-norms
Covariance, weighted norms and Mahalanobis distance
4.5 The exponential family (EF)
Maximum entropy distributions
Sufficient statistics and canonical EFs
Conjugate priors
Prior and posterior predictive EFs
Conjugate EF prior mixtures
4.6 Distributions defined through quantiles
4.7 Densities associated with piecewise linear loss functions
4.8 Nonparametric density estimation
4.9 Inference by sampling
MCMC inference
Assessing convergence in MCMC methods
Chapter 5: Probabilistic graphical models
5.1 Statistical modelling with PGMs
5.2 Exploring conditional independence in PGMs
Hidden versus observed variables
Directed connection and separation
The Markov blanket of a node
5.3 Inference on PGMs
Exact inference
Approximate inference
Chapter 6: Statistical machine learning
6.1 Feature and kernel functions
6.2 Mixture modelling
Gibbs sampling for the mixture model
E-M for mixture models
6.3 Classification
Quadratic and linear discriminant analysis (QDA and LDA)
Logistic regression
Support vector machines (SVM)
Classification loss functions and misclassification count
Which classifier to choose?
6.4 Regression
Linear regression
Bayesian and regularized linear regression
Linear-in parameters regression
Generalized linear models (GLMs)
Nonparametric, nonlinear regression
Variable selection
6.5 Clustering
K-means and variants
Soft K-means, mean shift and variants
Semi-supervised clustering and classification
Choosing the number of clusters
Other clustering methods
6.6 Dimensionality reduction
Principal components analysis (PCA)
Probabilistic PCA (PPCA)
Nonlinear dimensionality reduction
Chapter 7: Linear-Gaussian systems and signal processing
7.1 Preliminaries
Delta signals and related functions
Complex numbers, the unit root and complex exponentials
Marginals and conditionals of linear-Gaussian models
7.2 Linear, time-invariant (LTI) systems
Convolution and impulse response
The discrete-time Fourier transform (DTFT)
Finite-length, periodic signals: the discrete Fourier transform (DFT)
Continuous-time LTI systems
Heisenberg uncertainty
Gibb's phenomena
Transfer function analysis of discrete-time LTI systems
Fast Fourier transforms (FFT)
7.3 LTI signal processing
Rational filter design: FIR, IIR filtering
Digital filter recipes
Fourier filtering of very long signals
Kernel regression as discrete convolution
7.4 Exploiting statistical stability for linear-Gaussian DSP
Discrete-time Gaussian processes (GPs) and DSP
Nonparametric power spectral density (PSD) estimation
Parametric PSD estimation
Subspace analysis: using PCA in DSP
7.5 The Kalman filter (KF)
Junction tree algorithm (JT) for KF computations
Forward filtering
Backward smoothing
Incomplete data likelihood
Viterbi decoding
Baum-Welch parameter estimation
Kalman filtering as signal subspace analysis
7.6 Time-varying linear systems
Short-time Fourier transform (STFT) and perfect reconstruction
Continuous-time wavelet transforms (CWT)
Discretization and the discrete wavelet transform (DWT)
Wavelet design
Applications of the DWT
Chapter 8: Discrete signals: sampling, quantization and coding
8.1 Discrete-time sampling
Bandlimited sampling
Uniform bandlimited sampling: Shannon-Whittaker interpolation
Generalized uniform sampling
8.2 Quantization
Rate-distortion theory
Lloyd-Max and entropy-constrained quantizer design
Statistical quantization and dithering
Vector quantization
8.3 Lossy signal compression
Audio companding
Linear predictive coding (LPC)
Transform coding
8.4 Compressive sensing (CS)
Sparsity and incoherence
Exact reconstruction by convex optimization
Compressive sensing in practice
Chapter 9: Nonlinear and non-Gaussian signal processing
9.1 Running window filters
Maximum likelihood filters
Change point detection
9.2 Recursive filtering
9.3 Global nonlinear filtering
9.4 Hidden Markov models (HMMs)
Junction tree (JT) for efficient HMM computations
Viterbi decoding
Baum-Welch parameter estimation
Model evaluation and structured data classification
Viterbi parameter estimation
Avoiding numerical underflow in message passing
9.5 Homomorphic signal processing
Chapter 10: Nonparametric Bayesian machine learning and signal processing
10.1 Preliminaries
Exchangeability and de Finetti's theorem
Representations of stochastic processes
Partitions and equivalence classes
10.2 Gaussian processes (GP)
From basis regression to kernel regression
Distributions over function spaces: GPs
Bayesian GP kernel regression
GP regression and Wiener filtering
Other GP-related topics
10.3 Dirichlet processes (DP)
The Dirichlet distribution:canonical prior for the categorical distribution
Defining the Dirichlet and related processes
Infinite mixture models (DPMMs)
Can DP-based models actually infer the number of components?
Bibliography
Index