Reinforcement learning and Optimal Control - Draft version

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Draft copy of Reinforcement learning and optimal control by Dmitri Bertsekas

Author(s): Dmitri Bertsekas
Series: 1
Edition: 1
Publisher: Athena Scientific
Year: 2019

Language: English
Pages: 268
Tags: Dmitri Bertsekas, Reinforcement learning, Optimal control

RL_Frontmatter......Page 1
Preface......Page 9
Contents......Page 5
Chapter1......Page 13
1.1.1 Deterministic Problems......Page 15
1.1.2 The Dynamic Programming Algorithm......Page 20
1.1.3 Approximation in Value Space......Page 25
1.2 STOCHASTIC DYNAMIC PROGRAMMING......Page 27
1.3 EXAMPLES, VARIATIONS, AND SIMPLIFICATIONS......Page 30
1.3.1 Deterministic Shortest Path Problems......Page 32
1.3.2 Discrete Deterministic Optimization......Page 34
1.3.3 Problems with a Terminal State......Page 37
1.3.4 Forecasts......Page 39
1.3.5 Problems with Uncontrollable State Components......Page 41
1.3.6 Partial State Information and Belief States......Page 46
1.3.7 Linear Quadratic Optimal Control......Page 50
1.4 REINFORCEMENT LEARNING AND OPTIMAL CONTROL- SOME TERMINOLOGY......Page 53
1.5 NOTES AND SOURCES......Page 55
Chapter2......Page 58
2.1 GENERAL ISSUES OF APPROXIMATION IN VALUE SPACE......Page 63
2.1.1 Methods for Computing Approximations in Value Space......Page 64
2.1.2 Off-Line and On-Line Methods......Page 65
2.1.3 Model-Based Simplification of the LookaheadMinimization......Page 66
2.1.4 Model-Free Q-Factor Approximation in Value Space......Page 67
2.1.5 Approximation in Policy Space on Top of Approximationin Value Space......Page 70
2.1.6 When is Approximation in Value Space Effective?......Page 71
2.2 MULTISTEP LOOKAHEAD......Page 72
2.2.1 Multistep Lookahead and Rolling Horizon......Page 74
2.2.2 Multistep Lookahead and Deterministic Problems......Page 75
2.3.1 Enforced Decomposition......Page 77
2.3.2 Probabilistic Approximation - Certainty EquivalentControl......Page 84
2.4 ROLLOUT......Page 90
2.4.1 On-Line Rollout for Deterministic Finite-State Problems......Page 91
2.4.2 Stochastic Rollout and Monte Carlo Tree Search......Page 101
2.5 ON-LINE ROLLOUT FOR DETERMINISTIC INFINITE-SPACES PROBLEMS - OPTIMIZATION HEURISTICS......Page 111
2.5.1 Model Predictive Control......Page 112
2.5.2 Target Tubes and the Constrained ControllabilityCondition......Page 119
2.5.3 Variants of Model Predictive Control......Page 123
2.6 NOTES AND SOURCES......Page 125
Chapter3......Page 128
3.1.1 Linear and Nonlinear Feature-Based Architectures......Page 130
3.1.2 Training of Linear and Nonlinear Architectures......Page 137
3.1.3 Incremental Gradient and Newton Methods......Page 138
3.2 NEURAL NETWORKS......Page 151
3.2.1 Training of Neural Networks......Page 155
3.2.2 Multilayer and Deep Neural Networks......Page 158
3.3 SEQUENTIAL DYNAMIC PROGRAMMINGAPPROXIMATION......Page 162
3.4 Q-FACTOR PARAMETRIC APPROXIMATION......Page 164
3.5 NOTES AND SOURCES......Page 167
Chapter4......Page 168
4.1 AN OVERVIEW OF INFINITE HORIZON PROBLEMS......Page 171
4.2 STOCHASTIC SHORTEST PATH PROBLEMS......Page 174
4.3 DISCOUNTED PROBLEMS......Page 184
4.4 EXACT AND APPROXIMATE VALUE ITERATION......Page 189
4.5 POLICY ITERATION......Page 193
4.5.1 Exact Policy Iteration......Page 194
4.5.2 Optimistic and Multistep Lookahead Policy Iteration......Page 198
4.5.3 Policy Iteration for Q-factors......Page 200
4.6 APPROXIMATION IN VALUE SPACE - PERFORMANCEBOUNDS......Page 202
4.6.1 Limited Lookahead Performance Bounds......Page 204
4.6.2 Rollout......Page 207
4.6.3 Approximate Policy Iteration......Page 211
4.7.1 Self-Learning and Actor-Critic Systems......Page 214
4.7.2 A Model-Based Variant......Page 215
4.7.3 A Model-Free Variant......Page 218
4.7.4 Implementation Issues of Parametric Policy Iteration......Page 220
4.8 Q-LEARNING......Page 223
4.9 ADDITIONAL METHODS - TEMPORAL DIFFERENCES......Page 226
4.10 EXACT AND APPROXIMATE LINEAR PROGRAMMING......Page 237
4.11 APPROXIMATION IN POLICY SPACE......Page 239
4.11.1 Training by Cost Optimization - Policy Gradient andRandom Search Methods......Page 241
4.11.2 Expert Supervised Training......Page 247
4.12 NOTES AND SOURCES......Page 249
4.13 APPENDIX: MATHEMATICAL ANALYSIS......Page 252
4.13.1 Proofs for Stochastic Shortest Path Problems......Page 253
4.13.2 Proofs for Discounted Problems......Page 258
4.13.3 Convergence of Exact and Optimistic Policy Iteration......Page 259
4.13.4 Performance Bounds for One-Step Lookahead, Rollout,and Approximate Policy Iteration......Page 261