Continuous-Time Markov Decision Processes

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book offers a systematic and rigorous treatment of continuous-time Markov decision processes, covering both theory and possible applications to queueing systems, epidemiology, finance, and other fields. Unlike most books on the subject, much attention is paid to problems with functional constraints and the realizability of strategies. Three major methods of investigations are presented, based on dynamic programming, linear programming, and reduction to discrete-time problems. Although the main focus is on models with total (discounted or undiscounted) cost criteria, models with average cost criteria and with impulsive controls are also discussed in depth. The book is self-contained. A separate chapter is devoted to Markov pure jump processes and the appendices collect the requisite background on real analysis and applied probability. All the statements in the main text are proved in detail. Researchers and graduate students in applied probability, operational research, statistics and engineering will find this monograph interesting, useful and valuable.

Author(s): Alexey Piunovskiy, Yi Zhang
Series: Probability Theory and Stochastic Modelling 97
Edition: 1
Publisher: Springer
Year: 2020

Language: English
Pages: 583
Tags: Markov Decision Process

Foreword
Preface
Contents
Notation
1 Description of CTMDPs and Preliminaries
1.1 Description of the CTMDP
1.1.1 Initial Data and Conventional Notations
1.1.2 Informal Description
1.1.3 Strategies, Strategic Measures, and Optimal Control Problems
1.1.4 Realizable Strategies
1.1.5 Instant Costs at the Jump Epochs
1.2 Examples
1.2.1 Queueing Systems
1.2.2 The Freelancer Dilemma
1.2.3 Epidemic Models
1.2.4 Inventory Control
1.2.5 Selling an Asset
1.2.6 Power-Managed Systems
1.2.7 Fragmentation Models
1.2.8 Infrastructure Surveillance Models
1.2.9 Preventive Maintenance
1.3 Detailed Occupation Measures and Further Sufficient Classes …
1.3.1 Definitions and Notations
1.3.2 Sufficiency of Markov π-Strategies
1.3.3 Sufficiency of Markov Standard ξ-Strategies
1.3.4 Counterexamples
1.3.5 The Discounted Cost Model as a Special Case of Undiscounted
1.4 Bibliographical Remarks
2 Selected Properties of Controlled Processes
2.1 Transition Functions and the Markov Property
2.1.1 Basic Definitions and Notations
2.1.2 Construction of the Transition Function
2.1.3 The Minimal (Nonnegative) Solution to the Kolmogorov Forward Equation
2.1.4 Markov Property of the Controlled Process Under a Natural Markov Strategy
2.2 Conditions for Non-explosiveness Under a Fixed Natural Markov Strategy
2.2.1 The Nonhomogeneous Case
2.2.2 The Homogeneous Case
2.2.3 Possible Generalizations
2.2.4 A Condition for Non-explosiveness Under All Strategies
2.2.5 Direct Proof for Non-explosiveness Under All Strategies
2.3 Examples
2.3.1 Birth-and-Death Processes
2.3.2 The Gaussian Model
2.3.3 The Fragmentation Model
2.3.4 The Infrastructure Surveillance Model
2.4 Dynkin's Formula
2.4.1 Preliminaries
2.4.2 Non-explosiveness and Dynkin's Formula
2.4.3 Dynkin's Formula Under All π-Strategies
2.4.4 Example
2.5 Bibliographical Remarks
3 The Discounted Cost Model
3.1 The Unconstrained Problem
3.1.1 The Optimality Equation
3.1.2 Dynamic Programming and Dual Linear Programs
3.2 The Constrained Problem
3.2.1 Properties of the Total Occupation Measures
3.2.2 The Primal Linear Program and Its Solvability
3.2.3 Comparison of the Convex Analytic and Dynamic Programming Approaches
3.2.4 Duality
3.2.5 The Space of Performance Vectors
3.3 Examples
3.3.1 A Queuing System
3.3.2 A Birth-and-Death Process
3.4 Bibliographical Remarks
4 Reduction to DTMDP: The Total Cost Model
4.1 Poisson-Related Strategies
4.2 Reduction to DTMDP
4.2.1 Description of the Concerned DTMDP
4.2.2 Selected Results of the Reduction to DTMDP
4.2.3 Examples
4.2.4 Models with Strongly Positive Intensities
4.2.5 Example: Preventive Maintenance
4.3 Bibliographical Remarks
5 The Average Cost Model
5.1 Unconstrained Problems
5.1.1 Unconstrained Problem: Nonnegative Cost
5.1.2 Unconstrained Problem: Weight Function
5.2 Constrained Problems
5.2.1 The Primal Linear Program and Its Solvability
5.2.2 Duality
5.2.3 The Space of Performance Vectors
5.2.4 Denumerable and Finite Models
5.3 Examples
5.3.1 The Gaussian Model
5.3.2 The Freelancer Dilemma
5.4 Bibliographical Remarks
6 The Total Cost Model: General Case
6.1 Description of the General Total Cost Model
6.1.1 Generalized Control Strategies and Their Strategic Measures
6.1.2 Subclasses of Strategies
6.2 Detailed Occupation Measures and Sufficient Classes of Strategies
6.2.1 Detailed Occupation Measures
6.2.2 Sufficient Classes of Strategies
6.2.3 Counterexamples
6.3 Reduction to DTMDP
6.4 Mixtures of Strategies and Convexity of Spaces of Strategic and Occupation Measures
6.4.1 Properties of Strategic Measures
6.4.2 Properties of Occupation Measures
6.5 Example: Utilization of an Unreliable Device
6.6 Realizable Strategies
6.7 Bibliographical Remarks
7 Gradual-Impulsive Control Models
7.1 The Total Cost Model and Reduction
7.1.1 System Primitives
7.1.2 Total Cost Gradual-Impulsive Control Problems
7.1.3 Reduction to CTMDP Model with Gradual Control
7.2 Example: An Epidemic with Carriers
7.2.1 Problem Statement
7.2.2 General Plan
7.2.3 Optimal Solution to the Associated DTMDP Problem
7.2.4 The Optimal Solution to the Original Gradual-Impulsive Control Problem
7.3 The Discounted Cost Model
7.3.1 α-Discounted Gradual-Impulsive Control Problems
7.3.2 Reduction to DTMDP with Total Cost
7.3.3 The Dynamic Programming Approach
7.3.4 Example: The Inventory Model
7.4 Bibliographical Remarks
Appendix A Miscellaneous Results
A.1 Compensators and Predictable Processes
A.2 On Non-realizability of Relaxed Strategies
A.3 Comments on the Proof of Theorem 1.1.3摥映數爠eflinkt211p1.1.31
A.4 A Condition for Non-explosiveness
A.5 Markov Pure Jump Processes
A.6 On the Uniformly Exponential Ergodicity of CTMDPs in Denumerable State Space
A.7 Proofs of Technical Lemmas
Appendix B Relevant Definitions and Facts
B.1 Real Analysis
B.1.1 Borel Spaces and Semianalytic Functions
B.1.2 Semicontinuous Functions
B.1.3 Spaces of Functions
B.1.4 Passage to the Limit
B.1.5 The Tauberian Theorem
B.1.6 Measures and Spaces of Measures
B.1.7 Stochastic Kernels
B.1.8 Carathéodory Functions and Measurable Selection
B.1.9 Conditional Expectations and Regular Conditional Measures
B.1.10 Other Useful Facts
B.2 Convex Analysis and Optimization
B.2.1 Convex Sets and Convex Functions
B.2.2 Linear Programs
B.2.3 Convex Programs
B.3 Stochastic Processes and Applied Probability
B.3.1 Some General Definitions and Facts
B.3.2 Renewal and Regenerative Processes
Appendix C Definitions and Facts about Discrete-Time Markov Decision Processes
C.1 Description of the DTMDP
C.2 Selected Facts and Optimality Results
C.2.1 Results for the Unconstrained Problem
C.2.2 Results for the Constrained Problem
Appendix References
Index