The first comprehensive guide to distributional reinforcement learning, providing a new mathematical formalism for thinking about decisions from a probabilistic perspective.
Distributional reinforcement learning is a new mathematical formalism for thinking about decisions. Going beyond the common approach to reinforcement learning and expected values, it focuses on the total reward or return obtained as a consequence of an agent's choices—specifically, how this return behaves from a probabilistic perspective. In this first comprehensive guide to distributional reinforcement learning, Marc G. Bellemare, Will Dabney, and Mark Rowland, who spearheaded development of the field, present its key concepts and review some of its many applications. They demonstrate its power to account for many complex, interesting phenomena that arise from interactions with one's environment.
The authors present core ideas from classical reinforcement learning to contextualize distributional topics and include mathematical proofs pertaining to major results discussed in the text. They guide the reader through a series of algorithmic and mathematical developments that, in turn, characterize, compute, estimate, and make decisions on the basis of the random return. Practitioners in disciplines as diverse as finance (risk management), computational neuroscience, computational psychiatry, psychology, macroeconomics, and robotics are already using distributional reinforcement learning, paving the way for its expanding applications in mathematical finance, engineering, and the life sciences. More than a mathematical approach, distributional reinforcement learning represents a new perspective on how intelligent agents make predictions and decisions.
Author(s): Marc G. Bellemare
Publisher: MIT Press
Year: 2023
Language: English
Pages: 384
1 Introduction
3 1.1 Why Distributional Reinforcement Learning?
4 1.2 An Example: Kuhn Poker
5 1.3 How Is Distributional Reinforcement Learning Different?
6 1.4 Intended Audience and Organization
7 1.5 Bibliographical Remarks
8 2 The Distribution of Returns
9 2.1 Random Variables and Their Probability Distributions
10 2.2 Markov Decision Processes
11 2.3 The Pinball Model
12 2.4 The Return
13 2.5 The Bellman Equation
14 2.6 Properties of the Random Trajectory
15 2.7 The Random-Variable Bellman Equation
16 2.8 From Random Variables to Probability Distributions
17 2.9 Alternative Notions of the Return Distribution*
18 2.10 Technical Remarks
19 2.11 Bibliographical Remarks
20 2.12 Exercises
21 3 Learning the Return Distribution
22 3.1 The Monte Carlo Method
23 3.2 Incremental Learning
24 3.3 Temporal-Difference Learning
25 3.4 From Values to Probabilities
26 3.5 The Projection Step
27 3.6 Categorical Temporal-Difference Learning
28 3.7 Learning to Control
29 3.8 Further Considerations
30 3.9 Technical Remarks
31 3.10 Bibliographical Remarks
32 3.11 Exercises
33 4 Operators and Metrics
34 4.1 The Bellman Operator
35 4.2 Contraction Mappings
36 4.3 The Distributional Bellman Operator
37 4.4 Wasserstein Distances for Return Functions
38 4.5 ℓ p Probability Metrics and the Cramér Distance
39 4.6 Sufficient Conditions for Contractivity
40 4.7 A Matter of Domain
41 4.8 Weak Convergence of Return Functions*
42 4.9 Random-Variable Bellman Operators*
43 4.10 Technical Remarks
44 4.11 Bibliographical Remarks
45 4.12 Exercises
46 5 Distributional Dynamic Programming
47 5.1 Computational Model
48 5.2 Representing Return-Distribution Functions
49 5.3 The Empirical Representation
50 5.4 The Normal Representation
5.5 Fixed-Size Empirical Representations
52 5.6 The Projection Step
53 5.7 Distributional Dynamic Programming
54 5.8 Error Due to Diffusion
55 5.9 Convergence of Distributional Dynamic Programming
56 5.10 Quality of the Distributional Approximation
57 5.11 Designing Distributional Dynamic Programming Algorithms
58 5.12 Technical Remarks
59 5.13 Bibliographical Remarks
60 5.14 Exercises
61 6 Incremental Algorithms
62 6.1 Computation and Statistical Estimation
63 6.2 From Operators to Incremental Algorithms
64 6.3 Categorical Temporal-Difference Learning
65 6.4 Quantile Temporal-Difference Learning
66 6.5 An Algorithmic Template for Theoretical Analysis
67 6.6 The Right Step Sizes
68 6.7 Overview of Convergence Analysis
69 6.8 Convergence of Incremental Algorithms*
70 6.9 Convergence of Temporal-Difference Learning*
71 6.10 Convergence of Categorical Temporal-Difference Learning*
72 6.11 Technical Remarks
73 6.12 Bibliographical Remarks
74 6.13 Exercises
75 7 Control
76 7.1 Risk-Neutral Control
77 7.2 Value Iteration and Q-Learning
78 7.3 Distributional Value Iteration
79 7.4 Dynamics of Distributional Optimality Operators
80 7.5 Dynamics in the Presence of Multiple Optimal Policies*
81 7.6 Risk and Risk-Sensitive Control
82 7.7 Challenges in Risk-Sensitive Control
83 7.8 Conditional Value-At-Risk*
84 7.9 Technical Remarks
85 7.10 Bibliographical Remarks
86 7.11 Exercises
87 8 Statistical Functionals
88 8.1 Statistical Functionals
89 8.2 Moments
90 8.3 Bellman Closedness
91 8.4 Statistical Functional Dynamic Programming
92 8.5 Relationship to Distributional Dynamic Programming
93 8.6 Expectile Dynamic Programming
94 8.7 Infinite Collections of Statistical Functionals
95 8.8 Moment Temporal-Difference Learning*
96 8.9 Technical Remarks
97 8.10 Bibliographical Remarks
98 8.11 Exercises
99 9 Linear Function Approximation
100 9.1 Function Approximation and Aliasing
101 9.2 Optimal Linear Value Function Approximations
102 9.3 A Projected Bellman Operator for Linear Value Function Approximation
103 9.4 Semi-Gradient Temporal-Difference Learning
104 9.5 Semi-Gradient Algorithms for Distributional Reinforcement Learning
105 9.6 An Algorithm Based on Signed Distributions*
106 9.7 Convergence of the Signed Algorithm*
107 9.8 Technical Remarks
108 9.9 Bibliographical Remarks
109 9.10 Exercises
110 10 Deep Reinforcement Learning
111 10.1 Learning with a Deep Neural Network
10.2 Distributional Reinforcement Learning with Deep Neural Networks
113 10.3 Implicit Parameterizations
114 10.4 Evaluation of Deep Reinforcement Learning Agents
115 10.5 How Predictions Shape State Representations
116 10.6 Technical Remarks
117 10.7 Bibliographical Remarks
118 10.8 Exercises
119 11 Two Applications and a Conclusion
120 11.1 Multiagent Reinforcement Learning
121 11.2 Computational Neuroscience
122 11.3 Conclusion
123 11.4 Bibliographical Remarks
124 Notation
125 References
126 Index