Adaptive controllers and optimal controllers are two distinct methods for the design of automatic control systems. Adaptive controllers learn online in real time how to control systems but do not yield optimal performance, whereas optimal controllers must be designed offline using full knowledge of the systems dynamics. This book shows how approximate dynamic programming - a reinforcement machine learning technique that is motivated by learning mechanisms in biological and animal systems - can be used to design a family of adaptive optimal control algorithms that converge in real-time to optimal control solutions by measuring data along the system trajectories.
The book also describes how to use approximate dynamic programming methods to solve multi-player differential games online. Differential games have been shown to be important in H-infinity robust control for disturbance rejection, and in coordinating activities among multiple agents in networked teams. The focus of this book is on continuous-time systems, whose dynamical models can be derived directly from physical principles based on Hamiltonian or Lagrangian dynamics. Simulation examples are given throughout the book, and several methods are described that do not require full state dynamics information.
Author(s): Draguna Vrabie, Kyriakos G. Vamvoudakis, Frank. L Lewis
Series: IET Control Engineering Series 81
Publisher: The Institution of Engineering and Technology
Year: 2013
Language: English
Pages: xvi+288
Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles......Page 4
Contents......Page 8
Preface......Page 13
Acknowledgements......Page 16
1 Introduction to optimal control, adaptive control and reinforcement learning......Page 18
1.1.1 Linear quadratic regulator......Page 19
1.1.2 Linear quadratic zero-sum games......Page 20
1.2 Adaptive control......Page 21
1.3 Reinforcement learning......Page 24
1.4 Optimal adaptive control......Page 25
2 Reinforcement learning and optimal control of discrete-time systems: Using natural decision methods to design optimal adaptive controllers......Page 26
2.1 Markov decision processes......Page 28
2.1.1 Optimal sequential decision problems......Page 29
2.1.2 A backward recursion for the value......Page 31
2.1.4 Bellman equation and Bellman optimality equation......Page 32
2.2 Policy evaluation and policy improvement......Page 36
2.2.2 Iterative policy iteration......Page 38
2.2.3 Value iteration......Page 39
2.2.4 Generalized policy iteration......Page 42
2.2.5 Q function......Page 43
2.3 Methods for implementing policy iteration and value iteration......Page 46
2.4 Temporal difference learning......Page 47
2.5 Optimal adaptive control for discrete-time systems......Page 49
2.5.1 Policy iteration and value iteration for discrete-time dynamical systems......Page 51
2.5.2 Value function approximation......Page 52
2.5.3 Optimal adaptive control algorithms for discrete-time systems......Page 53
2.5.4 Introduction of a second ‘Actor’ neural network......Page 55
2.5.5 Online solution of Lyapunov and Riccati equations......Page 59
2.5.7 Q learning for optimal adaptive control......Page 60
2.6 Reinforcement learning for continuous-time systems......Page 63
Part I: Optimal adaptive control using reinforcement learning structures......Page 66
3 Optimal adaptive control using integralre in forcement learning for linear systems......Page 68
3.1 Continuous-time adaptive critic solution for the linear quadratic regulator......Page 70
3.1.1 Policy iteration algorithm using integral reinforcement......Page 71
3.1.2 Proof of convergence......Page 72
3.2.1 Adaptive online implementation of IRL algorithm......Page 75
3.2.2 Structure of the adaptive IRL algorithm......Page 78
3.3 Online IRL load-frequency controller design for a power system......Page 81
3.4 Conclusion......Page 86
4 Integral reinforcement learning (IRL) for non-linear continuous-time systems......Page 88
4.1 Non-linear continuous-time optimal control......Page 89
4.2 Integral reinforcement learning policy iterations......Page 91
4.2.1 Integral reinforcement learning policy iteration algorithm......Page 93
4.2.2 Convergence of IRL policy iteration......Page 95
4.3.1 Value function approximation and temporal difference error......Page 96
4.3.2 Convergence of approximate value function to solution of the Bellman equation......Page 98
4.4.1 Actor–critic structure for online implementation of adaptive optimal control algorithm......Page 102
4.4.2 Relation of adaptive IRL control structure to learning mechanisms in the mammal brain......Page 105
4.5.1 Non-linear system example 1......Page 106
4.5.2 Non-linear system example 2......Page 107
4.6 Conclusion......Page 109
5 Generalized policy iteration for continuous-time systems......Page 110
5.1.1 Policy iteration for continuous-time systems......Page 111
5.1.2 Integral reinforcement learning for continuous-time systems......Page 112
5.2 Generalized policy iteration for continuous-time systems......Page 113
5.2.1 Preliminaries: Mathematical operators for policy iteration......Page 114
5.2.2 Contraction maps for policy iteration......Page 115
5.2.3 A new formulation of continuous-time policy iteration: Generalized policy iteration......Page 117
5.2.4 Continuous-time generalized policy iteration......Page 118
5.3 Implementation of generalized policy iteration algorithm......Page 120
5.4.1 Example 1: Linear system......Page 121
5.4.2 Example 2: Non-linear system......Page 122
5.5 Conclusion......Page 124
6 Value iteration for continuous-time systems......Page 126
6.1 Continuous-time heuristic dynamic programming for the LQR problem......Page 127
6.1.1 Continuous-time HDP formulation using integral reinforcement learning......Page 128
6.1.2 Online tuning value iteration algorithm for partially unknown systems......Page 130
6.2 Mathematical formulation of the HDP algorithm......Page 131
6.3.1 System model and motivation......Page 134
6.3.2 Simulation setup and results......Page 135
6.3.3 Comments on the convergence of CT-HDP algorithm......Page 137
6.4 Conclusion......Page 139
Part II: Adaptive control structures based on reinforcement learning......Page 140
7 Optimal adaptive control using synchronous online learning......Page 142
7.1 Optimal control and policy iteration......Page 144
7.2 Value function approximation and critic neural network......Page 146
7.3 Tuning and convergence of critic NN......Page 149
7.4 Action neural network and online synchronous policy iteration......Page 153
7.5 Structure of adaptive controllers and synchronous optimal adaptive control......Page 155
7.6.1. Linear system example......Page 159
7.6.2. Non-linear system example......Page 160
7.7 Conclusion......Page 164
8 Synchronous online learning with integral reinforcement......Page 166
8.1 Optimal control and policy iteration using integral reinforcement learning......Page 167
8.2 Critic neural network and Bellman equation solution......Page 170
8.3 Action neural network and adaptive tuning laws......Page 173
8.4 Simulations......Page 175
8.4.1 Linear system......Page 176
8.4.2 Non-linear system......Page 178
8.5 Conclusion......Page 181
Part III: Online differential games using reinforcement learning......Page 182
9 Synchronous online learning for zero-sum two-player games and H-infinity control......Page 184
9.1 Two-player differential game and H∞ control......Page 185
9.1.1 Two-player zero-sum differential games and Nash equilibrium......Page 186
9.1.2 Application of Zero-sum games to H∞ control......Page 189
9.1.3 Linear quadratic zero-sum games......Page 190
9.2 Policy iteration solution of the HJI equation......Page 191
9.3 Actor–critic approximator structure for online policy iteration algorithm......Page 193
9.3.1 Value function approximation and critic neural network......Page 194
9.3.2 Tuning and convergence of the critic neural network......Page 196
9.3.3 Action and disturbance neural networks......Page 199
9.4 Online solution of two-player zero-sum games using neural networks......Page 200
9.5.1 Online solution of generalized ARE for linear quadratic ZS games......Page 204
9.5.2 Online solution of HJI equation for non-linear ZS game......Page 206
9.6 Conclusion......Page 211
10 Synchronous online learning for multiplayer non–zero-sum games......Page 212
10.1.1 Background on non–zero-sum games......Page 213
10.2 Policy iteration solution for non–zero-sum games......Page 216
10.3.1 Value function approximation and critic neural networks for solution of Bellman equations......Page 217
10.3.2 Action neural networks and online learning algorithm......Page 221
10.4.1 Non-linear system......Page 228
10.4.2 Linear system......Page 231
10.4.3 Zero-sum game with unstable linear system......Page 232
10.5 Conclusion......Page 235
11 Integral reinforcement learning for zero-sum two-player games......Page 238
11.1.1 Background......Page 240
11.1.2 Offline algorithm to solve the game algebraic Riccati equation......Page 241
11.1.3 Continuous-time HDP algorithm to solve Riccati equation......Page 244
11.2 Online algorithm to solve the zero-sum differential game......Page 246
11.3 Online load–frequency controller design for a power system......Page 249
11.4 Conclusion......Page 252
Proofs for selected results from Chapter 4......Page 254
Proofs for Chapter 7......Page 256
Proof for Chapter 8......Page 263
Proofs for Chapter 9......Page 272
Proofs for Chapter 10......Page 279
References......Page 290
Index......Page 298