The Elements of Joint Learning and Optimization in Operations Management

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book examines recent developments in Operations Management, and focuses on four major application areas: dynamic pricing, assortment optimization, supply chain and inventory management, and healthcare operations. Data-driven optimization in which real-time input of data is being used to simultaneously learn the (true) underlying model of a system and optimize its performance, is becoming increasingly important in the last few years, especially with the rise of Big Data.

Author(s): Xi Chen, Stefanus Jasin, Cong Shi
Series: Springer Series in Supply Chain Management, 18
Publisher: Springer
Year: 2022

Language: English
Pages: 443
City: Cham

Preface
Contents
Editors and Contributors
About the Editors
Contributors
Part I Generic Tools
1 The Stochastic Multi-Armed Bandit Problem
1.1 Introduction
1.2 The N-Armed Bandit Problem
1.2.1 Upper Confidence Bound (UCB) Algorithm
1.2.2 Thompson Sampling (TS)
1.3 Contextual Bandits
1.4 Combinatorial Bandits
References
2 Reinforcement Learning
2.1 Introduction
2.2 Markov Decision Process and Dynamic Programming
2.2.1 Finite-Horizon Markov Decision Process
2.2.1.1 Dynamic Programming Solution
2.2.2 Discounted Markov Decision Process
2.2.2.1 Value Iteration
2.2.2.2 Policy Iteration
2.3 Reinforcement Learning Algorithm Design
2.3.1 Reinforcement Learning Problem Formulation
2.3.1.1 Episodic Reinforcement Learning in Finite-Horizon MDP
2.3.1.2 Reinforcement Learning in Discounted MDP
2.3.2 Model-Based vs. Model-Free Reinforcement Learning
2.3.2.1 Model-Based Reinforcement Learning
2.3.2.2 Q-Learning and SARSA
2.3.2.3 Policy Gradient
2.3.3 Exploration in Reinforcement Learning
2.3.3.1 Exploration Schemes
2.3.3.2 Deep Exploration
2.3.4 Approximate Solution Methods and Deep Reinforcement Learning
2.4 Conclusion and Further Reading
References
3 Optimal Learning and Optimal Design
3.1 Introduction
3.2 Statistical Design of Experiments
3.3 The Ranking and Selection Problem
3.3.1 Model
3.3.2 Large Deviations Analysis
3.3.3 Example: Normal Sampling Distributions
3.3.4 Optimal Allocations
3.4 Sequential Algorithms
3.4.1 Value of Information Methods
3.4.2 Thompson Sampling
3.4.3 Rate-Balancing Methods
3.4.4 Discussion
3.5 Recent Advances
3.5.1 A New Optimal Design for Linear Regression
3.5.2 Optimal Budget Allocation in Approximate Dynamic Programming
3.6 Conclusion
References
Part II Price Optimization
4 Dynamic Pricing with Demand Learning: Emerging Topics and State of the Art
4.1 Introduction
4.2 Model
4.3 Asymptotically Optimal Pricing Policies
4.3.1 Parametric Approaches
4.3.1.1 Model and Estimation
4.3.1.2 Certainty-Equivalence Pricing and Incomplete Learning
4.3.1.3 Asymptotically Optimal Policies
4.3.1.4 Extensions to Generalized Linear Models
4.3.1.5 Extensions to Multiple Products
4.3.2 Nonparametric Approaches
4.3.3 Extensions and Generalizations
4.4 Emerging Topics and Generalizations
4.4.1 Product Differentiation
4.4.2 Online Marketplaces
4.4.3 Continuous-Time Approximations
References
5 Learning and Pricing with Inventory Constraints
5.1 Introduction
5.2 Single Product Case
5.2.1 Dynamic Pricing Algorithm
5.2.2 Lower Bound Example
5.3 Multiproduct Setting
5.3.1 Preliminaries
5.3.2 Parametric Case
5.3.3 Nonparametric Case
5.4 Bayesian Learning Setting
5.4.1 Model Setting
5.4.2 Thompson Sampling with Fixed Inventory Constraints
5.4.3 Thompson Sampling with Inventory Constraint Updating
5.4.4 Performance Analysis
5.5 Remarks and Further Reading
References
6 Dynamic Pricing and Demand Learning in Nonstationary Environments
6.1 Introduction
6.2 Problem Formulation
6.3 Exogenously Changing Demand Environments
6.3.1 Change-Point Detection Models
6.3.2 Finite-State-Space Markov Chains
6.3.3 Autoregressive Models
6.3.4 General Changing Environments
6.3.5 Contextual Pricing
6.4 Endogenously Changing Demand Environments
6.4.1 Reference-Price Effects
6.4.2 Competition and Collusion
6.4.3 Platforms and Multi-Agent Learning
6.4.4 Forward-Looking and Patient Customers
References
7 Pricing with High-Dimensional Data
7.1 Introduction
7.2 Background: High-Dimensional Statistics
7.3 Static Pricing with High-Dimensional Data
7.3.1 Feature-Dependent Choice Model
7.3.2 Estimation Method
7.3.3 Performance Guarantees
7.4 Dynamic Pricing with High-Dimensional Data
7.4.1 Feature-Dependent Demand Model
7.4.2 Learning-and-Earning Algorithm
7.4.3 A Universal Lower Bound on the Regret
7.4.4 Performance of ILQX
7.4.5 Discussion
7.5 Directions for Future Research
References
Part III Assortment Optimization
8 Nonparametric Estimation of Choice Models
8.1 Introduction
8.2 General Setup
8.3 Estimating the Rank-Based Model
8.3.1 Estimation via the Conditional Gradient Algorithm
8.3.1.1 Solving the Support Finding Step
8.3.1.2 Solving the Proportions Update Step
8.3.1.3 Initialization and Stopping Criterion
8.3.2 Convergence Guarantee for the Estimation Algorithm
8.4 Estimating the Nonparametric Mixture of Closed Logit (NPMXCL) Model
8.4.1 Estimation via the Conditional Gradient Algorithm
8.4.1.1 Solving the Support Finding Step
8.4.1.2 Solving the Proportions Update Step
8.4.1.3 Initialization and Stopping Criterion
8.4.2 Convergence Guarantee for the Estimation Algorithm
8.4.3 Characterizing the Choice Behavior of Closed Logit Types
8.5 Other Nonparametric Choice Models
8.6 Concluding Thoughts
References
9 The MNL-Bandit Problem
9.1 Introduction
9.2 Choice Modeling and Assortment Optimization
9.3 Dynamic Learning in Assortment Selection
9.4 A UCB Approach for the MNL-Bandit
9.4.1 Algorithmic Details
9.4.2 Min–Max Regret Bounds
9.4.3 Improved Regret Bounds for ``Well Separated'' Instances
9.4.4 Computational Study
9.4.4.1 Robustness of Algorithm 1
9.4.4.2 Comparison with Existing Approaches
9.5 Thompson Sampling for the MNL-Bandit
9.5.1 Algorithm
9.5.2 A TS Algorithm with Independent Beta Priors
9.5.3 A TS Algorithm with Posterior Approximation and Correlated Sampling
9.5.4 Regret Analysis
9.5.5 Empirical Study
9.6 Lower Bound for the MNL-Bandit
9.7 Conclusions and Recent Progress
References
10 Dynamic Assortment Optimization: Beyond MNL Model
10.1 Overview
10.2 General Utility Distributions
10.2.1 Model Formulation and Assumptions
10.2.2 Algorithm Design
10.2.3 Theoretical Analysis
10.2.4 Bibliographic Notes and Discussion of Future Directions
10.3 Nested Logit Models
10.3.1 Model Formulation and Assumptions
10.3.2 Assortment Space Reductions
10.3.3 Algorithm Design and Regret Analysis
10.3.4 Regret Lower Bound
10.3.5 Bibliographic Notes and Discussion of Future Directions
10.4 MNL Model with Contextual Features
10.4.1 Model Formulation and Assumptions
10.4.2 Algorithm Design: Thompson Sampling
10.4.3 Algorithm Design: Upper Confidence Bounds
10.4.4 Lower Bounds
10.4.5 Bibliographic Notes and Discussion of Future Directions
10.5 Conclusion
References
Part IV Inventory Optimization
11 Inventory Control with Censored Demand
11.1 Introduction
11.2 Regret Lower Bound for Inventory Models with Censored Demand
11.2.1 Model Formulation
11.2.2 Strictly Convex and Well-Separated Cases
11.2.3 Worst-Case Regret Under General Demand Distributions
11.3 Censored Demand Example: Perishable Inventory System
11.3.1 Model Formulation
11.3.2 Challenges and Preliminary Results
11.3.3 Learning Algorithm Design: Cycle-Update Policy
11.3.4 Regret Analysis of CUP Algorithm
11.3.5 Strongly Convex Extension
11.4 Lead Times Example: Lost-Sales System with Lead Times
11.4.1 Model Formulation
11.4.2 Base-Stock Policy and Convexity Results
11.4.3 Challenges from Lead Times
11.4.4 Gradient Methods
11.4.5 A Ternary Search Method
11.5 High Dimensionality Example: Multiproduct Inventory Model with Customer Choices
11.5.1 Inventory Substitution
11.5.2 Numerical Example
References
12 Joint Pricing and Inventory Control with Demand Learning
12.1 Problem Formulation in General
12.2 Nonparametric Learning for Backlogged Demand
12.3 Nonparametric Learning for Lost-Sales System
12.3.1 Algorithms and Results in chen2017nonparametric
12.3.2 Algorithms and Results in chen2020optimal
12.3.2.1 Concave G(·)
12.3.2.2 Non-Concave G(·)
12.4 Parametric Learning with Limited Price Changes
12.4.1 Well-Separated Demand
12.4.2 General Demand
12.5 Backlog System with Fixed Ordering Cost
12.6 Other Models
References
13 Optimization in the Small-Data, Large-Scale Regime
13.1 Why Small Data?
13.1.1 Structure
13.2 Contrasting the Large-Sample and Small-Data, Large-Scale Regimes
13.2.1 Model
13.2.2 Failure of Sample Average Approximation (SAA)
13.2.3 Best-in-Class Performance
13.2.4 Shortcomings of Cross-Validation
13.3 Debiasing In-Sample Performance
13.3.1 Stein Correction
13.3.2 From Unbiasedness to Policy Selection
13.3.3 Stein Correction in the Large-Sample Regime
13.3.4 Open Questions
13.4 Conclusion
References
Part V Healthcare Operations
14 Bandit Procedures for Designing Patient-Centric Clinical Trials
14.1 Introduction
14.2 The Bayesian Beta-Bernoulli MABP
14.2.1 Discussion of the Model
14.3 Metrics for Two-Armed Problem (Confirmatory Trials)
14.3.1 Accurate and Precise Estimation
14.3.2 Statistical Errors
14.3.3 Patient Benefit
14.3.4 Trial Size
14.3.5 Multiple Metrics
14.4 Illustrative Results for Two-Armed Problem
14.5 Discussion
14.5.1 Safety Concerns
14.5.2 Prior Distributions
14.5.3 Delayed Responses
14.5.4 Dropouts and Missing Responses
14.5.5 Early Evidence of Efficacy or Futility
14.5.6 Non-binary Outcomes
14.5.7 Exploratory Trials
14.5.8 Large Trials
References
15 Dynamic Treatment Regimes for Optimizing Healthcare
15.1 Introduction
15.2 Mathematical Framework
15.2.1 Potential Outcomes Framework
15.3 Data Sources for Constructing DTRs
15.3.1 Longitudinal Observational Data
15.3.1.1 The CIBMTR Registry: Two Study Examples for Constructing DTRs with Observational Data
15.3.2 Sequentially Randomized Studies
15.3.2.1 The SMART Weight Loss Management Study
15.3.3 Dynamical Systems Models
15.3.3.1 A Dynamical Systems Model for Behavioral Weight Change
15.4 Methods for Constructing DTRs
15.4.1 Origins and Development of DTRs
15.4.2 Reinforcement Learning: A Potential Solution
15.4.3 Taxonomy of Existing Methods
15.4.4 Finite-Horizon DTRs
15.4.4.1 Indirect Methods
15.4.4.2 Direct RL Methods
15.4.5 Indefinite-Horizon DTRs
15.5 Inference in DTRs
15.5.1 Inference for Parameters Indexing the Optimal Regime
15.5.2 Inference for the Value Function of a Regime
15.6 Practical Considerations and Final Remarks
15.6.1 Model Choice and Variable Selection
15.6.2 Sample Size Considerations and Power Analysis
15.6.3 Missing Data
15.6.4 Additional Issues and Final Remarks
References