Essential Math for AI: Next-Level Mathematics for Efficient and Successful AI Systems

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Many industries are eager to integrate AI and data-driven technologies into their systems and operations. But to build truly successful AI systems, you need a firm grasp of the underlying mathematics. This comprehensive guide bridges the gap in presentation between the potential and applications of AI and its relevant mathematical foundations. In an immersive and conversational style, the book surveys the mathematics necessary to thrive in the AI field, focusing on real-world applications and state-of-the-art models, rather than on dense academic theory. You'll explore topics such as regression, neural networks, convolution, optimization, probability, graphs, random walks, Markov processes, differential equations, and more within an exclusive AI context geared toward computer vision, natural language processing, generative models, reinforcement learning, operations research, and automated systems. With a broad audience in mind, including engineers, data scientists, mathematicians, scientists, and people early in their careers, the book helps build a solid foundation for success in the AI and math fields. You'll be able to: • Comfortably speak the languages of AI, machine learning, data science, and mathematics • Unify machine learning models and natural language models under one mathematical structure • Handle graph and network data with ease • Explore real data, visualize space transformations, reduce dimensions, and process images • Decide on which models to use for different data-driven projects • Explore the various implications and limitations of AI

Author(s): Hala Nelson
Edition: 1
Publisher: O'Reilly Media
Year: 2023

Language: English
Commentary: Publisher's PDF. Revision History for the First Edition 2023-02-03: Second Release
Pages: 602
City: Sebastopol, CA
Tags: Artificial Intelligence;Machine Learning;Probabilistic Models;Neural Networks;Natural Language Processing;Bayesian Networks;Regression;Decision Trees;Popular Science;Computer Vision;Image Processing;Ethics;Convolutional Neural Networks;Recurrent Neural Networks;Boltzmann Machines;Generative Adversarial Networks;Support Vector Machines;Sentiment Analysis;Graph Data Model;Statistics;Optimization;Partial Differential Equations;Spam Detection;Social Media;Graph Theory;Graph Algorithms;Mathematics

Cover
Copyright
Table of Contents
Preface
Why I Wrote This Book
Who Is This Book For?
Who Is This Book Not For?
How Will the Math Be Presented in This Book?
Infographic
What Math Background Is Expected from You to Be Able to Read This Book?
Overview of the Chapters
My Favorite Books on AI
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Chapter 1. Why Learn the Mathematics of AI?
What Is AI?
Why Is AI So Popular Now?
What Is AI Able to Do?
An AI Agent’s Specific Tasks
What Are AI’s Limitations?
What Happens When AI Systems Fail?
Where Is AI Headed?
Who Are the Current Main Contributors to the AI Field?
What Math Is Typically Involved in AI?
Summary and Looking Ahead
Chapter 2. Data, Data, Data
Data for AI
Real Data Versus Simulated Data
Mathematical Models: Linear Versus Nonlinear
An Example of Real Data
An Example of Simulated Data
Mathematical Models: Simulations and AI
Where Do We Get Our Data From?
The Vocabulary of Data Distributions, Probability, and Statistics
Random Variables
Probability Distributions
Marginal Probabilities
The Uniform and the Normal Distributions
Conditional Probabilities and Bayes’ Theorem
Conditional Probabilities and Joint Distributions
Prior Distribution, Posterior Distribution, and Likelihood Function
Mixtures of Distributions
Sums and Products of Random Variables
Using Graphs to Represent Joint Probability Distributions
Expectation, Mean, Variance, and Uncertainty
Covariance and Correlation
Markov Process
Normalizing, Scaling, and/or Standardizing a Random Variable or Data Set
Common Examples
Continuous Distributions Versus Discrete Distributions (Density Versus Mass)
The Power of the Joint Probability Density Function
Distribution of Data: The Uniform Distribution
Distribution of Data: The Bell-Shaped Normal (Gaussian) Distribution
Distribution of Data: Other Important and Commonly Used Distributions
The Various Uses of the Word “Distribution”
A/B Testing
Summary and Looking Ahead
Chapter 3. Fitting Functions to Data
Traditional and Very Useful Machine Learning Models
Numerical Solutions Versus Analytical Solutions
Regression: Predict a Numerical Value
Training Function
Loss Function
Optimization
Logistic Regression: Classify into Two Classes
Training Function
Loss Function
Optimization
Softmax Regression: Classify into Multiple Classes
Training Function
Loss Function
Optimization
Incorporating These Models into the Last Layer of a Neural Network
Other Popular Machine Learning Techniques and Ensembles of Techniques
Support Vector Machines
Decision Trees
Random Forests
k-means Clustering
Performance Measures for Classification Models
Summary and Looking Ahead
Chapter 4. Optimization for Neural Networks
The Brain Cortex and Artificial Neural Networks
Training Function: Fully Connected, or Dense, Feed Forward Neural Networks
A Neural Network Is a Computational Graph Representation of the Training Function
Linearly Combine, Add Bias, Then Activate
Common Activation Functions
Universal Function Approximation
Approximation Theory for Deep Learning
Loss Functions
Optimization
Mathematics and the Mysterious Success of Neural Networks
Gradient Descent ω → i+1 = ω → i - η ∇ L ( ω → i )
Explaining the Role of the Learning Rate Hyperparameter η
Convex Versus Nonconvex Landscapes
Stochastic Gradient Descent
Initializing the Weights ω → 0 for the Optimization Process
Regularization Techniques
Dropout
Early Stopping
Batch Normalization of Each Layer
Control the Size of the Weights by Penalizing Their Norm
Penalizing the l 2 Norm Versus Penalizing the l 1 Norm
Explaining the Role of the Regularization Hyperparameter α
Hyperparameter Examples That Appear in Machine Learning
Chain Rule and Backpropagation: Calculating ∇ L ( ω → i )
Backpropagation Is Not Too Different from How Our Brain Learns
Why Is It Better to Backpropagate?
Backpropagation in Detail
Assessing the Significance of the Input Data Features
Summary and Looking Ahead
Chapter 5. Convolutional Neural Networks and Computer Vision
Convolution and Cross-Correlation
Translation Invariance and Translation Equivariance
Convolution in Usual Space Is a Product in Frequency Space
Convolution from a Systems Design Perspective
Convolution and Impulse Response for Linear and Translation Invariant Systems
Convolution and One-Dimensional Discrete Signals
Convolution and Two-Dimensional Discrete Signals
Filtering Images
Feature Maps
Linear Algebra Notation
The One-Dimensional Case: Multiplication by a Toeplitz Matrix
The Two-Dimensional Case: Multiplication by a Doubly Block Circulant Matrix
Pooling
A Convolutional Neural Network for Image Classification
Summary and Looking Ahead
Chapter 6. Singular Value Decomposition: Image Processing, Natural Language Processing, and Social Media
Matrix Factorization
Diagonal Matrices
Matrices as Linear Transformations Acting on Space
Action of A on the Right Singular Vectors
Action of A on the Standard Unit Vectors and the Unit Square Determined by Them
Action of A on the Unit Circle
Breaking Down the Circle-to-Ellipse Transformation According to the Singular Value Decomposition
Rotation and Reflection Matrices
Action of A on a General Vector x →
Three Ways to Multiply Matrices
The Big Picture
The Condition Number and Computational Stability
The Ingredients of the Singular Value Decomposition
Singular Value Decomposition Versus the Eigenvalue Decomposition
Computation of the Singular Value Decomposition
Computing an Eigenvector Numerically
The Pseudoinverse
Applying the Singular Value Decomposition to Images
Principal Component Analysis and Dimension Reduction
Principal Component Analysis and Clustering
A Social Media Application
Latent Semantic Analysis
Randomized Singular Value Decomposition
Summary and Looking Ahead
Chapter 7. Natural Language and Finance AI: Vectorization and Time Series
Natural Language AI
Preparing Natural Language Data for Machine Processing
Statistical Models and the log Function
Zipf’s Law for Term Counts
Various Vector Representations for Natural Language Documents
Term Frequency Vector Representation of a Document or Bag of Words
Term Frequency-Inverse Document Frequency Vector Representation of a Document
Topic Vector Representation of a Document Determined by Latent Semantic Analysis
Topic Vector Representation of a Document Determined by Latent Dirichlet Allocation
Topic Vector Representation of a Document Determined by Latent Discriminant Analysis
Meaning Vector Representations of Words and of Documents Determined by Neural Network Embeddings
Cosine Similarity
Natural Language Processing Applications
Sentiment Analysis
Spam Filter
Search and Information Retrieval
Machine Translation
Image Captioning
Chatbots
Other Applications
Transformers and Attention Models
The Transformer Architecture
The Attention Mechanism
Transformers Are Far from Perfect
Convolutional Neural Networks for Time Series Data
Recurrent Neural Networks for Time Series Data
How Do Recurrent Neural Networks Work?
Gated Recurrent Units and Long Short-Term Memory Units
An Example of Natural Language Data
Finance AI
Summary and Looking Ahead
Chapter 8. Probabilistic Generative Models
What Are Generative Models Useful For?
The Typical Mathematics of Generative Models
Shifting Our Brain from Deterministic Thinking to Probabilistic Thinking
Maximum Likelihood Estimation
Explicit and Implicit Density Models
Explicit Density-Tractable: Fully Visible Belief Networks
Example: Generating Images via PixelCNN and Machine Audio via WaveNet
Explicit Density-Tractable: Change of Variables Nonlinear Independent Component Analysis
Explicit Density-Intractable: Variational Autoencoders Approximation via Variational Methods
Explicit Density-Intractable: Boltzman Machine Approximation via Markov Chain
Implicit Density-Markov Chain: Generative Stochastic Network
Implicit Density-Direct: Generative Adversarial Networks
How Do Generative Adversarial Networks Work?
Example: Machine Learning and Generative Networks for High Energy Physics
Other Generative Models
Naive Bayes Classification Model
Gaussian Mixture Model
The Evolution of Generative Models
Hopfield Nets
Boltzmann Machine
Restricted Boltzmann Machine (Explicit Density and Intractable)
The Original Autoencoder
Probabilistic Language Modeling
Summary and Looking Ahead
Chapter 9. Graph Models
Graphs: Nodes, Edges, and Features for Each
Example: PageRank Algorithm
Inverting Matrices Using Graphs
Cayley Graphs of Groups: Pure Algebra and Parallel Computing
Message Passing Within a Graph
The Limitless Applications of Graphs
Brain Networks
Spread of Disease
Spread of Information
Detecting and Tracking Fake News Propagation
Web-Scale Recommendation Systems
Fighting Cancer
Biochemical Graphs
Molecular Graph Generation for Drug and Protein Structure Discovery
Citation Networks
Social Media Networks and Social Influence Prediction
Sociological Structures
Bayesian Networks
Traffic Forecasting
Logistics and Operations Research
Language Models
Graph Structure of the Web
Automatically Analyzing Computer Programs
Data Structures in Computer Science
Load Balancing in Distributed Networks
Artificial Neural Networks
Random Walks on Graphs
Node Representation Learning
Tasks for Graph Neural Networks
Node Classification
Graph Classification
Clustering and Community Detection
Graph Generation
Influence Maximization
Link Prediction
Dynamic Graph Models
Bayesian Networks
A Bayesian Network Represents a Compactified Conditional Probability Table
Making Predictions Using a Bayesian Network
Bayesian Networks Are Belief Networks, Not Causal Networks
Keep This in Mind About Bayesian Networks
Chains, Forks, and Colliders
Given a Data Set, How Do We Set Up a Bayesian Network for the Involved Variables?
Graph Diagrams for Probabilistic Causal Modeling
A Brief History of Graph Theory
Main Considerations in Graph Theory
Spanning Trees and Shortest Spanning Trees
Cut Sets and Cut Vertices
Planarity
Graphs as Vector Spaces
Realizability
Coloring and Matching
Enumeration
Algorithms and Computational Aspects of Graphs
Summary and Looking Ahead
Chapter 10. Operations Research
No Free Lunch
Complexity Analysis and O() Notation
Optimization: The Heart of Operations Research
Thinking About Optimization
Optimization: Finite Dimensions, Unconstrained
Optimization: Finite Dimensions, Constrained Lagrange Multipliers
Optimization: Infinite Dimensions, Calculus of Variations
Optimization on Networks
Traveling Salesman Problem
Minimum Spanning Tree
Shortest Path
Max-Flow Min-Cut
Max-Flow Min-Cost
The Critical Path Method for Project Design
The n-Queens Problem
Linear Optimization
The General Form and the Standard Form
Visualizing a Linear Optimization Problem in Two Dimensions
Convex to Linear
The Geometry of Linear Optimization
The Simplex Method
Transportation and Assignment Problems
Duality, Lagrange Relaxation, Shadow Prices, Max-Min, Min-Max, and All That
Sensitivity
Game Theory and Multiagents
Queuing
Inventory
Machine Learning for Operations Research
Hamilton-Jacobi-Bellman Equation
Operations Research for AI
Summary and Looking Ahead
Chapter 11. Probability
Where Did Probability Appear in This Book?
What More Do We Need to Know That Is Essential for AI?
Causal Modeling and the Do Calculus
An Alternative: The Do Calculus
Paradoxes and Diagram Interpretations
Monty Hall Problem
Berkson’s Paradox
Simpson’s Paradox
Large Random Matrices
Examples of Random Vectors and Random Matrices
Main Considerations in Random Matrix Theory
Random Matrix Ensembles
Eigenvalue Density of the Sum of Two Large Random Matrices
Essential Math for Large Random Matrices
Stochastic Processes
Bernoulli Process
Poisson Process
Random Walk
Wiener Process or Brownian Motion
Martingale
Levy Process
Branching Process
Markov Chain
Itô’s Lemma
Markov Decision Processes and Reinforcement Learning
Examples of Reinforcement Learning
Reinforcement Learning as a Markov Decision Process
Reinforcement Learning in the Context of Optimal Control and Nonlinear Dynamics
Python Library for Reinforcement Learning
Theoretical and Rigorous Grounds
Which Events Have a Probability?
Can We Talk About a Wider Range of Random Variables?
A Probability Triple (Sample Space, Sigma Algebra, Probability Measure)
Where Is the Difficulty?
Random Variable, Expectation, and Integration
Distribution of a Random Variable and the Change of Variable Theorem
Next Steps in Rigorous Probability Theory
The Universality Theorem for Neural Networks
Summary and Looking Ahead
Chapter 12. Mathematical Logic
Various Logic Frameworks
Propositional Logic
From Few Axioms to a Whole Theory
Codifying Logic Within an Agent
How Do Deterministic and Probabilistic Machine Learning Fit In?
First-Order Logic
Relationships Between For All and There Exist
Probabilistic Logic
Fuzzy Logic
Temporal Logic
Comparison with Human Natural Language
Machines and Complex Mathematical Reasoning
Summary and Looking Ahead
Chapter 13. Artificial Intelligence and Partial Differential Equations
What Is a Partial Differential Equation?
Modeling with Differential Equations
Models at Different Scales
The Parameters of a PDE
Changing One Thing in a PDE Can Be a Big Deal
Can AI Step In?
Numerical Solutions Are Very Valuable
Continuous Functions Versus Discrete Functions
PDE Themes from My Ph.D. Thesis
Discretization and the Curse of Dimensionality
Finite Differences
Finite Elements
Variational or Energy Methods
Monte Carlo Methods
Some Statistical Mechanics: The Wonderful Master Equation
Solutions as Expectations of Underlying Random Processes
Transforming the PDE
Fourier Transform
Laplace Transform
Solution Operators
Example Using the Heat Equation
Example Using the Poisson Equation
Fixed Point Iteration
AI for PDEs
Deep Learning to Learn Physical Parameter Values
Deep Learning to Learn Meshes
Deep Learning to Approximate Solution Operators of PDEs
Numerical Solutions of High-Dimensional Differential Equations
Simulating Natural Phenomena Directly from Data
Hamilton-Jacobi-Bellman PDE for Dynamic Programming
PDEs for AI?
Other Considerations in Partial Differential Equations
Summary and Looking Ahead
Chapter 14. Artificial Intelligence, Ethics, Mathematics, Law, and Policy
Good AI
Policy Matters
What Could Go Wrong?
From Math to Weapons
Chemical Warfare Agents
AI and Politics
Unintended Outcomes of Generative Models
How to Fix It?
Addressing Underrepresentation in Training Data
Addressing Bias in Word Vectors
Addressing Privacy
Addressing Fairness
Injecting Morality into AI
Democratization and Accessibility of AI to Nonexperts
Prioritizing High Quality Data
Distinguishing Bias from Discrimination
The Hype
Final Thoughts
Index
About the Author
Colophon