Generative AI is the hottest topic in tech. This practical book teaches machine learning engineers and data scientists how to use TensorFlow and Keras to create impressive generative deep learning models from scratch, including variational autoencoders (VAEs), generative adversarial networks (GANs), Transformers, normalizing flows, energy-based models, and denoising diffusion models.
The book starts with the basics of deep learning and progresses to cutting-edge architectures. Through tips and tricks, you'll understand how to make your models learn more efficiently and become more creative.
Discover how VAEs can change facial expressions in photos
Train GANs to generate images based on your own dataset
Build diffusion models to produce new varieties of flowers
Train your own GPT for text generation
Learn how large language models like ChatGPT are trained
Explore state-of-the-art architectures such as StyleGAN2 and ViT-VQGAN
Compose polyphonic music using Transformers and MuseGAN
Understand how generative world models can solve reinforcement learning tasks
Dive into multimodal models such as DALL.E 2, Imagen, and Stable Diffusion
Author(s): David Foster
Edition: 2
Publisher: O'Reilly Media, Inc.
Year: 2023
Language: English
Pages: 453
Foreword
Preface
Objective and Approach
Prerequisites
Roadmap
Changes in the Second Edition
Other Resources
Conventions Used in This Book
Codebase
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
I. Introduction to Generative Deep Learning
1. Generative Modeling
What Is Generative Modeling?
Generative Versus Discriminative Modeling
The Rise of Generative Modeling
Generative Modeling and AI
Our First Generative Model
Hello World!
The Generative Modeling Framework
Representation Learning
Core Probability Theory
Generative Model Taxonomy
The Generative Deep Learning Codebase
Cloning the Repository
Using Docker
Running on a GPU
Summary
2. Deep Learning
Data for Deep Learning
Deep Neural Networks
What Is a Neural Network?
Learning High-Level Features
TensorFlow and Keras
Multilayer Perceptron (MLP)
Preparing the Data
Building the Model
Compiling the Model
Training the Model
Evaluating the Model
Convolutional Neural Network (CNN)
Convolutional Layers
Batch Normalization
Dropout
Building the CNN
Training and Evaluating the CNN
Summary
II. Methods
3. Variational Autoencoders
Introduction
Autoencoders
The Fashion-MNIST Dataset
The Autoencoder Architecture
The Encoder
The Decoder
Joining the Encoder to the Decoder
Reconstructing Images
Visualizing the Latent Space
Generating New Images
Variational Autoencoders
The Encoder
The Loss Function
Training the Variational Autoencoder
Analysis of the Variational Autoencoder
Exploring the Latent Space
The CelebA Dataset
Training the Variational Autoencoder
Analysis of the Variational Autoencoder
Generating New Faces
Latent Space Arithmetic
Morphing Between Faces
Summary
4. Generative Adversarial Networks
Introduction
Deep Convolutional GAN (DCGAN)
The Bricks Dataset
The Discriminator
The Generator
Training the DCGAN
Analysis of the DCGAN
GAN Training: Tips and Tricks
Wasserstein GAN with Gradient Penalty (WGAN-GP)
Wasserstein Loss
The Lipschitz Constraint
Enforcing the Lipschitz Constraint
The Gradient Penalty Loss
Training the WGAN-GP
Analysis of the WGAN-GP
Conditional GAN (CGAN)
CGAN Architecture
Training the CGAN
Analysis of the CGAN
Summary
5. Autoregressive Models
Introduction
Long Short-Term Memory Network (LSTM)
The Recipes Dataset
Working with Text Data
Tokenization
Creating the Training Set
The LSTM Architecture
The Embedding Layer
The LSTM Layer
The LSTM Cell
Training the LSTM
Analysis of the LSTM
Recurrent Neural Network (RNN) Extensions
Stacked Recurrent Networks
Gated Recurrent Units
Bidirectional Cells
PixelCNN
Masked Convolutional Layers
Residual Blocks
Training the PixelCNN
Analysis of the PixelCNN
Mixture Distributions
Summary
6. Normalizing Flow Models
Introduction
Normalizing Flows
Change of Variables
The Jacobian Determinant
The Change of Variables Equation
RealNVP
The Two Moons Dataset
Coupling Layers
Training the RealNVP Model
Analysis of the RealNVP Model
Other Normalizing Flow Models
GLOW
FFJORD
Summary
7. Energy-Based Models
Introduction
Energy-Based Models
The MNIST Dataset
The Energy Function
Sampling Using Langevin Dynamics
Training with Contrastive Divergence
Analysis of the Energy-Based Model
Other Energy-Based Models
Summary
8. Diffusion Models
Introduction
Denoising Diffusion Models (DDM)
The Flowers Dataset
The Forward Diffusion Process
The Reparameterization Trick
Diffusion Schedules
The Reverse Diffusion Process
The U-Net Denoising Model
Training the Diffusion Model
Sampling from the Denoising Diffusion Model
Analysis of the Diffusion Model
Summary
III. Applications
9. Transformers
Introduction
GPT
The Wine Reviews Dataset
Attention
Queries, Keys, and Values
Multihead Attention
Causal Masking
The Transformer Block
Positional Encoding
Training GPT
Analysis of GPT
Other Transformers
T5
GPT-3 and GPT-4
ChatGPT
Summary
10. Advanced GANs
Introduction
ProGAN
Progressive Training
Outputs
StyleGAN
The Mapping Network
The Synthesis Network
Outputs from StyleGAN
StyleGAN2
Weight Modulation and Demodulation
Path Length Regularization
No Progressive Growing
Outputs from StyleGAN2
Other Important GANs
Self-Attention GAN (SAGAN)
BigGAN
VQ-GAN
ViT VQ-GAN
Summary
11. Music Generation
Introduction
Transformers for Music Generation
The Bach Cello Suite Dataset
Parsing MIDI Files
Tokenization
Creating the Training Set
Sine Position Encoding
Multiple Inputs and Outputs
Analysis of the Music-Generating Transformer
Tokenization of Polyphonic Music
MuseGAN
The Bach Chorale Dataset
The MuseGAN Generator
The MuseGAN Critic
Analysis of the MuseGAN
Summary
12. World Models
Introduction
Reinforcement Learning
The CarRacing Environment
World Model Overview
Architecture
Training
Collecting Random Rollout Data
Training the VAE
The VAE Architecture
Exploring the VAE
Collecting Data to Train the MDN-RNN
Training the MDN-RNN
The MDN-RNN Architecture
Sampling from the MDN-RNN
Training the Controller
The Controller Architecture
CMA-ES
Parallelizing CMA-ES
In-Dream Training
Summary
13. Multimodal Models
Introduction
DALL.E 2
Architecture
The Text Encoder
CLIP
The Prior
The Decoder
Examples from DALL.E 2
Imagen
Architecture
DrawBench
Examples from Imagen
Stable Diffusion
Architecture
Examples from Stable Diffusion
Flamingo
Architecture
The Vision Encoder
The Perceiver Resampler
The Language Model
Examples from Flamingo
Summary
14. Conclusion
Timeline of Generative AI
2014–2017: The VAE and GAN Era
2018–2019: The Transformer Era
2020–2022: The Big Model Era
The Current State of Generative AI
Large Language Models
Text-to-Code Models
Text-to-Image Models
Other Applications
The Future of Generative AI
Generative AI in Everyday Life
Generative AI in the Workplace
Generative AI in Education
Generative AI Ethics and Challenges
Final Thoughts
Index
About the Author