Machine Learning for Engineers

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This self-contained introduction to Machine Learning, designed from the start with engineers in mind, will equip students with everything they need to start applying Machine Learning principles and algorithms to real-world engineering problems. With a consistent emphasis on the connections between estimation, detection, information theory, and optimization, it includes: an accessible overview of the relationships betweenMachine Learning and signal processing, providing a solid foundation for further study; clear explanations of the differences between state-of-the-art techniques and more classical methods, equipping students with all the understanding they need to make informed technique choices; demonstration of the links between information-theoretical concepts and their practical engineering relevance; reproducible examples using MATLAB, enabling hands-on student experimentation. Assuming only a basic understanding of probability and linear algebra, and accompanied by lecture slides and solutions for instructors, this is the ideal introduction to Machine Learning for engineering students of all disciplines. Advances in Machine Learning and Artificial Intelligence (AI) have made available new tools that are revolutionizing science, engineering, and society at large. Modern Machine Learning techniques build on conceptual and mathematical ideas from stochastic optimization, linear algebra, signal processing, Bayesian inference, as well as information theory and statistical learning theory. Students and researchers working in different fields of engineering are now expected to have a general grasp of Machine Learning principles and algorithms, and to be able to assess the relative relevance of available design solutions spanning the space between model-and data-based methodologies. This book is written with this audience in mind. In approaching the field of Machine Learning, students of signal processing and information theory may at first be ill at ease in reconciling the similarity between the techniques used in Machine Learning – least squares, gradient descent, maximum likelihood – with differences in terminology and emphasis (and hype?). Seasoned signal processing and information-theory researchers may in turn find the resurgence of machine learning somewhat puzzling (“didn’t we write off that technique three decades ago?”), while still being awed by the scale of current applications and by the efficiency of state-of-the-art methods. They may also pride themselves on seeing many of the ideas originating in their communities underpin Machine Learning solutions that have wide societal and economic repercussions. Existing books on the subject of Machine Learning come in different flavors: Some are compilations of algorithms mostly intended for computer scientists; and others focus on specific aspects, such as optimization, Bayesian reasoning, or theoretical principles. Books that have been used for many years as references, while still relevant, appear to be partly outdated and superseded by more recent research papers. In this context, what seems to be missing is a textbook aimed at engineering students and researchers that can be used for self-study, as well as for undergraduate and graduate courses alongside modules on statistical signal processing, information theory, and optimization. An ideal text should provide a principled introduction to machine learning that highlights connections with estimation, detection, information theory, and optimization, while offering a concise but extensive coverage of state-of-the-art topics and simple, reproducible examples. Filling this gap in the bookshelves of engineering libraries is the ambition of this book. Intended Audience: This book is intended for a general audience of students, engineers, and researchers with a background in probability and signal processing. To offer a self-contained introduction to these intended readers, the text introduces supervised and unsupervised learning in a systematic fashion – including the necessary background on linear algebra, probability, and optimization – taking the reader from basic tools to state-of-the-art methods within a unified, coherent presentation.

Author(s): Osvaldo Simeone
Publisher: Cambridge University Press
Year: 2023

Language: English
Pages: 602

Cover
Half-title
Title page
Copyright information
Contents
Preface
Acknowledgements
Notation
Acronyms
Part I Introduction and Background
1 When and How to Use Machine Learning
1.1 Overview
1.2 What is Machine Learning For?
1.3 Why Study Machine Learning Now?
1.4 What is Machine Learning?
1.5 Taxonomy of Machine Learning Methods
1.6 When to Use Machine Learning?
1.7 Summary
1.8 Recommended Resources
Bibliography
2 Background
2.1 Overview
2.2 Random Variables
2.3 Expectation
2.4 Variance
2.5 Vectors
2.6 Matrices
2.7 Random Vectors
2.8 Marginal Distributions
2.9 Conditional Distributions
2.10 Independence of Random Variables and Chain Rule of Probability
2.11 Bayes’ Theorem
2.12 Law of Iterated Expectations
2.13 Summary
2.14 Recommended Resources
Problems
Bibliography
Part II Fundamental Concepts and Algorithms
3 Inference, or Model-Driven Prediction
3.1 Overview
3.2 Defining Inference
3.3 Optimal Hard Prediction
3.4 Optimal Prediction for Jointly Gaussian Random Vectors
3.5 KL Divergence and Cross Entropy
3.6 Optimal Soft Prediction
3.7 Mutual Information
3.8 Log-Loss As a “Universal” Loss Function
3.9 Free Energy
3.10 Summary
3.11 Recommended Resources
Problems
Appendices
Bibliography
4 Supervised Learning: Getting Started
4.1 Overview
4.2 Defining Supervised Learning
4.3 Training Hard Predictors
4.4 Inductive Bias Selection and Validation
4.5 Bias and Estimation Error
4.6 Beyond the Bias vs. Estimation Error Trade-Off
4.7 Regularization
4.8 Testing
4.9 Training Soft Predictors
4.10 MAP Learning and Regularization
4.11 Kernel Methods
4.12 Summary
4.13 Recommended Resources
Problems
Appendices
Bibliography
5 Optimization for Machine Learning
5.1 Overview
5.2 Optimization for Training
5.3 Solving an Optimization Problem
5.4 First-Order Necessary Optimality Condition for Single-Variable Functions
5.5 Second-Order Optimality Conditions for Single-Variable Functions
5.6 Optimality Conditions for Convex Single-Variable Functions
5.7 Optimality Conditions for Multi-variable Functions
5.8 Gradient Descent
5.9 Convergence of Gradient Descent
5.10 Second-Order Methods
5.11 Stochastic Gradient Descent
5.12 Convergence of Stochastic Gradient Descent
5.13 Minimizing Population Loss vs. Minimizing Training Loss
5.14 Symbolic and Numerical Differentiation
5.15 Automatic Differentiation
5.16 Summary
5.17 Recommended Resources
Problems
Appendices
Bibliography
6 Supervised Learning: Beyond Least Squares
6.1 Overview
6.2 Discriminative Linear Models for Binary Classification: Hard Predictors
6.3 Discriminative Linear Models for Binary Classification: Soft Predictors and Logistic Regression
6.4 Discriminative Non-linear Models for Binary Classification: Multi-layer Neural Networks
6.5 Generative Models for Binary Classification
6.6 Discriminative Linear Models for Multi-class Classification: Softmax Regression
6.7 Discriminative Non-linear Models for Multi-class Classification: Multi-layer Neural Networks
6.8 Generative Models for Multi-class Classification
6.9 Mixture Models
6.10 Beyond Feedforward Multi-layer Neural Networks
6.11 K-Nearest Neighbors Classification
6.12 Applications to Regression
6.13 Summary
6.14 Recommended Resources
Problems
Appendices
Bibliography
7 Unsupervised Learning
7.1 Overview
7.2 Unsupervised Learning Tasks
7.3 Density Estimation
7.4 Latent-Variable Models
7.5 Autoencoders: x → z →
x
7.6 Discriminative Models: x →
z
7.7 Undirected Generative Models: x
↔ z
7.8 Directed Generative Models: z →
x
7.9 Training Directed Generative Models: K-Means Clustering
7.10 Training Directed Generative Models: Expectation Maximization
7.11 Summary
7.12 Recommended Resources
Problems
Appendices
Bibliography
Part III Advanced Tools and Algorithms
8 Statistical Learning Theory
8.1 Overview
8.2 Benchmarks and Decomposition of the Optimality Error
8.3 Probably Approximately Correct Learning
8.4 PAC Learning for Finite Model Classes
8.5 PAC Learning for General Model Classes: VC Dimension
8.6 PAC Bayes and Information-Theoretic Bounds
8.7 Summary
8.8 Recommended Resources
Problems
Bibliography
9 Exponential Family of Distributions
9.1 Overview
9.2 Definitions and Examples
9.3 Exponential-Family Distributions As Maximum-Entropy Models
9.4 Gradient of the Log-Loss
9.5 ML Learning
9.6 Information-Theoretic Metrics
9.7 Fisher Information Matrix
9.8 Generalized Linear Models
9.9 Summary
9.10 Recommended Resources
Problems
Appendices
Bibliography
10 Variational Inference and Variational
Expectation Maximization
10.1 Overview
10.2 Variational Inference, Amortized Variational Inference, and Variational EM
10.3 Exact Bayesian Inference
10.4 Laplace Approximation
10.5 Introducing Variational Inference
10.6 Mean-Field Variational Inference
10.7 Introducing Parametric Variational Inference
10.8 Black-Box Variational Inference
10.9 Reparametrization-Based Variational Inference
10.10 Combining Factorization and Parametrization for Variational Inference
10.11 Particle-Based Variational Inference and Stein Variational Gradient Descent
10.12 Amortized Variational Inference
10.13 Variational Expectation Maximization
10.14 Summary
10.15 Recommended Resources
Problems
Appendices
Bibliography
11 Information-Theoretic Inference and Learning
11.1 Overview
11.2 I-Projection and M-Projection
11.3 Generalized Variational Inference and Generalized Variational Expectation Maximization
11.4 Maximum-Entropy Learning
11.5 InfoMax
11.6 Information Bottleneck
11.7 Rate-Distortion Encoding
11.8 Two-Sample Estimation of Information-Theoretic Metrics
11.9 Beyond the KL Divergence: f-Divergences and Two-Sample Estimators
11.10 Generative Adversarial Networks
11.11 Distributionally Robust Learning
11.12 Summary
11.13 Recommended Resources
Problems
Appendices
Bibliography
12 Bayesian Learning
12.1 Overview
12.2 Frequentist Learning and Calibration
12.3 Basics of Bayesian Learning
12.4 Why Bayesian Learning?
12.5 Exact Bayesian Learning
12.6 Laplace Approximation
12.7 MC Sampling-Based Bayesian Learning
12.8 Variational Bayesian Learning
12.9 Model Selection without Validation: Empirical Bayes
12.10 Bayesian Non-parametric Learning
12.11 Bayesian Learning with Local Latent Variables
12.12 Generalized Bayesian Learning
12.13 Summary
12.14 Recommended Resources
Problems
Appendices
Bibliography
Part IV Beyond Centralized Single-Task Learning
13 Transfer Learning, Multi-task Learning, Continual Learning, and Meta-learning
13.1 Overview
13.2 Transfer Learning
13.3 Multi-task Learning
13.4 Continual Learning
13.5 Meta-learning
13.6 Bayesian Perspective on Transfer, Multi-task, Continual, and Meta-learning
13.7 Summary
13.8 Recommended Resources
Appendices
Bibliography
14 Federated Learning
14.1 Overview
14.2 Frequentist Federated Learning
14.3 Private Federated Learning
14.4 Bayesian Federated Learning
14.5 Summary
14.6 Recommended Resources
Bibliography
Part V Epilogue
15 Beyond This Book
15.1 Overview
15.2 Probabilistic Graphical Models
15.3 Adversarial Attacks
15.4 Causality
15.5 Quantum Machine Learning
15.6 Machine Unlearning
15.7 General AI?
15.8 Recommended Resources
Bibliography
Index