Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective.What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming.We give a fairly comprehensive catalog of learning problems, describe the core ideas, note a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.
Author(s): Csaba Szepesvári
Series: Synthesis Lectures on Artificial Intelligence and Machine Learning
Publisher: Morgan & Claypool
Year: 2010
Language: English
Pages: 103
Tags: Информатика и вычислительная техника;Искусственный интеллект;
Preface......Page 9
Acknowledgments......Page 13
Markov Decision Processes......Page 15
Value functions......Page 20
Dynamic programming algorithms for solving MDPs......Page 24
Tabular TD(0)......Page 25
Every-visit Monte-Carlo......Page 28
TD(): Unifying Monte-Carlo and TD(0)......Page 30
Algorithms for large state spaces......Page 32
TD() with function approximation......Page 36
Gradient temporal difference learning......Page 39
Least-squares methods......Page 41
The choice of the function space......Page 47
A catalog of learning problems......Page 51
Online learning in bandits......Page 52
Active learning in bandits......Page 54
Active learning in Markov Decision Processes......Page 55
Online learning in Markov Decision Processes......Page 56
Q-learning in finite MDPs......Page 61
Q-learning with function approximation......Page 63
Actor-critic methods......Page 66
Implementing a critic......Page 68
Implementing an actor......Page 70
Applications......Page 77
Software......Page 78
Contractions and Banach's fixed-point theorem......Page 79
Application to MDPs......Page 83
Bibliography......Page 87
Author's Biography......Page 103