Markov decision process (MDP) models are widely used for modeling sequential decision-making problems that arise in engineering, economics, computer science, and the social sciences. This book brings the state-of-the-art research together for the first time. It provides practical modeling methods for many real-world problems with high dimensionality or complexity which have not hitherto been treatable with Markov decision processes.
Author(s): Hyeong Soo Chang, Michael C. Fu, Jiaqiao Hu, Steven I. Marcus,
Edition: 1st Edition.
Year: 2007
Language: English
Pages: 202
Contents......Page 11
Selected Notation and Abbreviations......Page 14
1. Markov Decision Processes......Page 17
1.1 Optimality Equations......Page 19
1.2 Policy Iteration and Value Iteration......Page 21
1.3 Rolling-horizon Control......Page 23
1.4 Survey of Previous Work on Computational Methods......Page 24
1.5 Simulation......Page 27
1.6 Preview of Coming Attractions......Page 30
1.7 Notes......Page 31
2. Multi-stage Adaptive Sampling Algorithms......Page 33
2.1.1 Regret Analysis in Multi-armed Bandits......Page 35
2.1.2 Algorithm Description......Page 36
2.1.3 Alternative Estimators......Page 37
2.1.4 Convergence Analysis......Page 40
2.1.5 Numerical Example......Page 47
2.2 Pursuit Learning Automata Sampling......Page 56
2.2.1 Algorithm Description......Page 57
2.2.2 Convergence Analysis......Page 58
2.2.3 Application to POMDPs......Page 67
2.2.4 Numerical Example......Page 69
2.3 Notes......Page 75
3. Population-based Evolutionary Approaches......Page 76
3.1.1 Policy Switching......Page 78
3.1.3 Stopping Rule......Page 80
3.1.4 Convergence Analysis......Page 81
3.2 Evolutionary Random Policy Search......Page 82
3.2.1 Policy Improvement with Reward Swapping......Page 83
3.2.2 Exploration......Page 86
3.2.3 Convergence Analysis......Page 88
3.3.1 A One-dimensional Queueing Example......Page 91
3.3.2 A Two-dimensional Queueing Example......Page 100
3.5 Notes......Page 102
4. Model Reference Adaptive Search......Page 104
4.1 The Model Reference Adaptive Search Method......Page 106
4.1.1 The MRAS[sub(0)] Algorithm (Idealized Version)......Page 108
4.1.2 The MRAS[sub(1)] Algorithm (Adaptive Monte Carlo Version)......Page 111
4.1.3 The MRAS[sub(2)] Algorithm (Stochastic Optimization)......Page 113
4.2.1 MRAS[sub(0)] Convergence......Page 116
4.2.2 MRAS[sub(1)] Convergence......Page 122
4.2.3 MRAS[sub(2)] Convergence......Page 131
4.3 Application to MDPs via Direct Policy Learning......Page 144
4.3.2 Infinite-horizon MDPs......Page 145
4.3.4 Numerical Examples......Page 147
4.4.1 Algorithm Description......Page 156
4.4.2 Numerical Examples......Page 158
4.5 Application to Finite-horizon MDPs Using Adaptive Sampling......Page 161
4.6 Notes......Page 163
5. On-line Control Methods via Simulation......Page 164
5.1 Simulated Annealing Multiplicative Weights Algorithm......Page 168
5.1.1 Basic Algorithm Description......Page 169
5.1.2 Convergence Analysis......Page 170
5.1.3 Convergence of the Sampling Version of the Algorithm......Page 173
5.1.4 Numerical Example......Page 175
5.1.5 Simulated Policy Switching......Page 179
5.2 Rollout......Page 180
5.2.1 Parallel Rollout......Page 181
5.3 Hindsight Optimization......Page 183
5.3.1 Numerical Example......Page 184
5.4 Notes......Page 189
References......Page 191
E......Page 200
P......Page 201
W......Page 202