Reinforcement Learning for Finance begins by describing methods for training neural networks. Next, it discusses CNN and RNN – two kinds of neural networks used as deep learning networks in reinforcement learning. Further, the book dives into reinforcement learning theory, explaining the Markov decision process, value function, policy, and policy gradients, with their mathematical formulations and learning algorithms. It covers recent reinforcement learning algorithms from double deep-Q networks to twin-delayed deep deterministic policy gradients and generative adversarial networks with examples using the TensorFlow Python library. It also serves as a quick hands-on guide to TensorFlow programming, covering concepts ranging from variables and graphs to automatic differentiation, layers, models, and loss functions.
Neural network libraries like TensorFlow, PyTorch, and Caffe had made tremendous contributions in the rapid development, testing, and deployment of deep neural networks, but I found most applications restricted to computer science, computer vision, and robotics. Having to use reinforcement learning algorithms in finance served as another reminder of the paucity of texts in this field. Furthermore, I found myself referring to scholarly articles and papers for mathematical proofs of new reinforcement learning algorithms. This led me to write this book to provide a one-stop resource for Python programmers to learn the theory behind reinforcement learning, augmented with practical examples drawn from the field of finance.
In practical applications, reinforcement learning draws upon deep neural networks. To facilitate exposition of topics in reinforcement learning and for continuity, this book also provides an introduction to TensorFlow and covers neural network topics like convolutional neural networks (CNNs) and recurrent neural networks (RNNs).
Finally, this book also introduces readers to writing modular, reusable, and extensible reinforcement learning code. Having worked on developing trading strategies using reinforcement learning and publishing papers, I felt existing reinforcement learning libraries like TF-Agents are tightly coupled with the underlying implementation framework and do not express central concepts in reinforcement learning in a manner that is modular enough for someone conversant with concepts to pick up TF-Agent library usage or extend its algorithms for specific applications. The code samples covered in this book provide examples of how to write modular code for reinforcement learning.
After completing this book, you will understand reinforcement learning with deep q and generative adversarial networks using the TensorFlow library.
Author(s): Samit Ahlawat
Publisher: Apress
Year: 2023
Language: English
Pages: 435
Table of Contents
About the Author
Acknowledgments
Preface
Introduction
Chapter 1: Overview
1.1 Methods for Training Neural Networks
1.2 Machine Learning in Finance
1.3 Structure of the Book
Chapter 2: Introduction to TensorFlow
2.1 Tensors and Variables
2.2 Graphs, Operations, and Functions
2.3 Modules
2.4 Layers
2.5 Models
2.6 Activation Functions
2.7 Loss Functions
2.8 Metrics
2.9 Optimizers
2.10 Regularizers
2.11 TensorBoard
2.12 Dataset Manipulation
2.13 Gradient Tape
Chapter 3: Convolutional Neural Networks
3.1 A Simple CNN
3.2 Neural Network Layers Used in CNNs
3.3 Output Shapes and Trainable Parameters of CNNs
3.4 Classifying Fashion MNIST Images
3.5 Identifying Technical Patterns in Security Prices
3.6 Using CNNs for Recognizing Handwritten Digits
Chapter 4: Recurrent Neural Networks
4.1 Simple RNN Layer
4.2 LSTM Layer
4.3 GRU Layer
4.4 Customized RNN Layers
4.5 Stock Price Prediction
4.6 Correlation in Asset Returns
Chapter 5: Reinforcement Learning Theory
5.1 Basics
5.2 Methods for Estimating the Markov Decision Problem
5.3 Value Estimation Methods
5.3.1 Dynamic Programming
Finding the Optimal Path in a Maze
European Call Option Valuation
Valuation of a European Barrier Option
5.3.2 Generalized Policy Iteration
Policy Improvement Theorem
Policy Evaluation
Policy Improvement
5.3.3 Monte Carlo Method
Pricing an American Put Option
5.3.4 Temporal Difference (TD) Learning
SARSA
Valuation of an American Barrier Option
Least Squares Temporal Difference (LSTD)
Least Squares Policy Evaluation (LSPE)
Least Squares Policy Iteration (LSPI)
Q-Learning
Double Q-Learning
Eligibility Trace
5.3.5 Cartpole Balancing
5.4 Policy Learning
5.4.1 Policy Gradient Theorem
5.4.2 REINFORCE Algorithm
5.4.3 Policy Gradient with State-Action Value Function Approximation
5.4.4 Policy Learning Using Cross Entropy
5.5 Actor-Critic Algorithms
5.5.1 Stochastic Gradient–Based Actor-Critic Algorithms
5.5.2 Building a Trading Strategy
5.5.3 Natural Actor-Critic Algorithms
5.5.4 Cross Entropy–Based Actor-Critic Algorithms
Chapter 6: Recent RL Algorithms
6.1 Double Deep Q-Network: DDQN
6.2 Balancing a Cartpole Using DDQN
6.3 Dueling Double Deep Q-Network
6.4 Noisy Networks
6.5 Deterministic Policy Gradient
6.5.1 Off-Policy Actor-Critic Algorithm
6.5.2 Deterministic Policy Gradient Theorem
6.6 Trust Region Policy Optimization: TRPO
6.7 Natural Actor-Critic Algorithm: NAC
6.8 Proximal Policy Optimization: PPO
6.9 Deep Deterministic Policy Gradient: DDPG
6.10 D4PG
6.11 TD3PG
6.12 Soft Actor-Critic: SAC
6.13 Variational Autoencoder
6.14 VAE for Dimensionality Reduction
6.15 Generative Adversarial Networks
Bibliography
Index