From Shortest Paths to Reinforcement Learning: A MATLAB-Based Tutorial on Dynamic Programming

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Dynamic programming (DP) has a relevant history as a powerful and flexible optimization principle, but has a bad reputation as a computationally impractical tool. This book fills a gap between the statement of DP principles and their actual software implementation. Using MATLAB throughout, this tutorial gently gets the reader acquainted with DP and its potential applications, offering the possibility of actual experimentation and hands-on experience. The book assumes basic familiarity with probability and optimization, and is suitable to both practitioners and graduate students in engineering, applied mathematics, management, finance and economics.

Author(s): Paolo Brandimarte
Series: EURO Advanced Tutorials on Operational Research
Publisher: Springer
Year: 2021

Language: English
Pages: 207
City: Cham

Preface
Contents
1 The Dynamic Programming Principle
1.1 What Is Dynamic Programming?
1.2 Dynamic Decision Problems
1.2.1 Finite Horizon, Discounted Problems
1.2.2 Infinite Horizon, Discounted Problems
1.2.3 Infinite Horizon, Average Contribution Per Stage Problems
1.2.4 Problems with an Undefined Horizon
1.2.5 Decision Policies
1.3 An Example: Dynamic Single-Item Lot-Sizing
1.4 A Glimpse of the DP Principle: The Shortest Path Problem
1.4.1 Forward vs. Backward DP
1.4.2 Shortest Paths on Structured Networks
1.4.3 Stochastic Shortest Paths
1.5 The DP Decomposition Principle
1.5.1 Stochastic DP for Finite Time Horizons
1.5.2 Stochastic DP for Infinite Time Horizons
1.6 For Further Reading
References
2 Implementing Dynamic Programming
2.1 Discrete Resource Allocation: The Knapsack Problem
2.2 Continuous Budget Allocation
2.2.1 Interlude: Function Interpolation by Cubic Splines in MATLAB
2.2.2 Solving the Continuous Budget Allocation Problem by Numerical DP
2.3 Stochastic Inventory Control
2.4 Exploiting Structure
2.4.1 Using Shortest Paths to Solve the Deterministic Lot-Sizing Problem
2.4.2 Stochastic Lot-Sizing: S and (s,S) Policies
2.4.3 Structural Properties of Value Functions
2.5 The Curses of Dynamic Programming
2.5.1 The Curse of State Dimensionality
2.5.2 The Curse of Optimization
2.5.3 The Curse of Expectation
2.5.4 The Curse of Modeling
2.6 For Further Reading
References
3 Modeling for Dynamic Programming
3.1 Finite Markov Decision Processes
3.2 Different Shades of Stochastic DP
3.2.1 Post-decision State Variables
3.3 Variations on Inventory Management
3.3.1 Deterministic Lead Time
3.3.2 Perishable Items
3.4 Revenue Management
3.4.1 Static Model with Perfect Demand Segmentation
3.4.2 Dynamic Model with Perfect Demand Segmentation
3.4.3 Dynamic Model with Customer Choice
3.5 Pricing Financial Options with Early Exercise Features
3.5.1 Bias Issues in Dynamic Programming
3.6 Consumption–Saving with Uncertain Labor Income
3.7 For Further Reading
References
4 Numerical Dynamic Programming for Discrete States
4.1 Discrete-Time Markov Chains
4.2 Markov Decision Processes with a Finite Time Horizon
4.2.1 A Numerical Example: Random Walks and Optimal Stopping
4.3 Markov Decision Processes with an Infinite Time Horizon
4.4 Value Iteration
4.4.1 A Numerical Example of Value Iteration
Speeding Up Value Iteration
4.5 Policy Iteration
4.5.1 A Numerical Example of Policy Iteration
4.6 Value vs. Policy Iteration
4.7 Average Contribution Per Stage
4.7.1 Relative Value Iteration for Problems Involving Average Contributions Per Stage
4.7.2 Policy Iteration for Problems Involving Average Contributions Per Stage
4.7.3 An Example of Application to Preventive Maintenance
4.8 For Further Reading
References
5 Approximate Dynamic Programming and Reinforcement Learning for Discrete States
5.1 Sampling and Estimation in Non-stationary Settings
5.1.1 The Exploration vs. Exploitation Tradeoff
5.1.2 Non-stationarity and Exponential Smoothing
5.2 Learning by Temporal Differences and SARSA
5.3 Q-Learning for Finite MDPs
5.3.1 A Numerical Example
5.4 For Further Reading
References
6 Numerical Dynamic Programming for Continuous States
6.1 Solving Finite Horizon Problems by Standard Numerical Methods
6.2 A Numerical Approach to Consumption–Saving
6.2.1 Approximating the Optimal Policy by Numerical DP
6.2.2 Optimizing a Fixed Policy
6.2.3 Computational Experiments
6.3 Computational Refinements and Extensions
6.4 For Further Reading
References
7 Approximate Dynamic Programming and Reinforcement Learning for Continuous States
7.1 Option Pricing by ADP and Linear Regression
7.2 A Basic Framework for ADP
7.3 Least-Squares Policy Iteration
7.4 For Further Reading
References
Index