"This text presents optimal learning techniques with applications in energy, homeland security, health, sports, transportation science, biomedical research, biosurveillance, stochastic optimization, high technology, and complex resource allocation problems. The coverage utilizes a relatively new class of algorithmic strategies known as approximate dynamic programming, which merges dynamic programming (Markov decision processes), math programming (linear, nonlinear, and integer), simulation, and statistics. It features mathematical techniques that are applicable to a variety of situations, from identifying promising drug candidates to figuring out the best evacuation plan in the event of a natural disaster"--Provided by publisher. Read more... Front Matter -- The Challenges of Learning -- Adaptive Learning -- The Economics of Information -- Ranking and Selection -- The Knowledge Gradient -- Bandit Problems -- Elements of a Learning Problem -- Linear Belief Models -- Subset Selection Problems -- Optimizing a Scalar Function -- Optimal Bidding -- Stopping Problems -- Active Learning in Statistics -- Simulation Optimization -- Learning in Mathematical Programming -- Optimizing Over Continuous Measurements -- Learning with a Physical State -- Bibliography -- Index -- Wiley Series in Probability and Statistics
Author(s): Warren B Powell; Ilya Olegovich Ryzhov; Wiley InterScience (Online service)
Series: Wiley series in probability and statistics
Publisher: Wiley
Year: 2012
Language: English
Pages: 416
City: Hoboken, New Jersey
Tags: Информатика и вычислительная техника;Искусственный интеллект;
Optimal Learning......Page 5
CONTENTS......Page 9
Preface......Page 17
Acknowledgments......Page 21
1 The Challenges of Learning......Page 23
1.1 Learning the Best Path......Page 24
1.2 Areas of Application......Page 26
1.3 Major Problem Classes......Page 34
1.4 The Different Types of Learning......Page 35
1.5 Learning from Different Communities......Page 38
1.6.1 A Basic Decision Tree......Page 40
1.6.2 Decision Tree for Offline Learning......Page 42
1.6.3 Decision Tree for Online Learning......Page 43
1.6.4 Discussion......Page 47
1.8 Goals of this Book......Page 48
Problems......Page 49
2 Adaptive Learning......Page 53
2.1 The Frequentist View......Page 54
2.2 The Bayesian View......Page 55
2.2.1 The Updating Equations for Independent Beliefs......Page 56
2.2.2 The Expected Value of Information......Page 58
2.2.3 Updating for Correlated Normal Priors......Page 60
2.2.4 Bayesian Updating with an Uninformative Prior......Page 63
2.3 Updating for Non-Gaussian Priors......Page 64
2.3.1 The Gamma
Exponential Model......Page 65
2.3.2 The Gamma
Poisson Model......Page 66
2.3.3 The Pareto-Uniform Model......Page 67
2.3.4 Models for Learning Probabilities*......Page 68
2.3.5 Learning an Unknown Variance*......Page 71
2.4 Monte Carlo Simulation......Page 73
þÿ......Page 76
2.5.2 Derivation of Bayesian Updating Equations for Independent Beliefs......Page 77
Problems......Page 79
3.1 An Elementary Information Problem......Page 83
3.2 The Marginal Value of Information......Page 87
3.3 An information Acquisition Problem......Page 90
Problems......Page 92
4 Ranking and Selection......Page 93
4.1 The Model......Page 94
4.2.1 Deterministic Versus Sequential Policies......Page 97
4.2.2 Optimal Sequential Policies......Page 98
4.2.3 Heuristic Policies......Page 99
4.3 Evaluating Policies......Page 103
4.4.1 An Alternative Representation of the Probability Space......Page 105
4.4.2 Equivalence of Using True Means and Sample Estimates......Page 106
Problems......Page 107
5 The Knowledge Gradient......Page 111
5.1 The Knowledge Gradient for Independent Beliefs......Page 112
5.1.1 Computation......Page 113
5.1.2 Some Properties of the Knowledge Gradient......Page 115
5.1.3 The Four Distributions of Learning......Page 116
5.2 The Value of Information and the S-Curve Effect......Page 117
5.3 Knowledge Gradient for Correlated Beliefs......Page 120
5.4 Anticipatory Versus Experiential Learning......Page 125
5.5.1 The Gamma
Exponential Model......Page 127
5.5.2 The Gamma
Poisson Model......Page 130
5.5.3 The Pareto-Uniform Model......Page 131
5.5.4 The Beta
Bernoulli Model......Page 133
5.5.5 Discussion......Page 135
5.6.1 Expected Improvement......Page 136
5.6.2 Linear Loss*......Page 137
5.7 The Problem of Priors......Page 140
5.9.1 Derivation of the Knowledge Gradient Formula......Page 142
Problems......Page 147
6 Bandit Problems......Page 161
6.1 The Theory and Practice of Gittins Indices......Page 163
6.1.1 Gittins Indices in the Beta
Bernoulli Model......Page 164
6.1.2 Gittins Indices in the Normal
Normal Model......Page 167
6.1.3 Approximating Gittins Indices......Page 169
6.2 Variations of Bandit Problems......Page 170
6.3 Upper Confidence Bounding......Page 171
6.4.1 The Basic Idea......Page 173
6.4.2 Some Experimental Comparisons......Page 175
6.4.3 Non
Normal Models......Page 178
Problems......Page 179
7 Elements of a Learning Problem......Page 185
7.1 The States of our System......Page 186
7.2 Types of Decisions......Page 188
7.3 Exogenous Information......Page 189
7.5 Objective Functions......Page 190
7.5.1 Designing Versus Controlling......Page 191
7.5.3 Objectives......Page 192
7.6 Evaluating Policies......Page 197
7.7 Discussion......Page 199
Problems......Page 200
8 Linear Belief Models......Page 203
8.1.1 Maximizing Ad Clicks......Page 204
8.1.3 Housing Loans......Page 206
8.1.4 Optimizing Dose Response......Page 207
8.2.1 The Normal Equations......Page 208
8.2.2 Recursive Least Squares......Page 209
8.2.3 A Bayesian Interpretation......Page 210
8.2.4 Generating a Prior......Page 211
8.3 The Knowledge Gradient for a Linear Model......Page 213
8.4 Application to Drug Discovery......Page 214
8.5 Application to Dynamic Pricing......Page 218
Problems......Page 222
9 Subset Selection Problems......Page 225
9.1 Applications......Page 227
9.2.1 Setting Prior Means and Variances......Page 229
9.2.2 Two Strategies for Setting Prior Covariances......Page 230
9.3 Larger Sets......Page 231
9.3.1 Using Simulation to Reduce the Problem Size......Page 232
9.3.2 Computational Issues......Page 234
9.3.3 Experiments......Page 235
9.4 Very Large Sets......Page 236
Problems......Page 238
10.1 Deterministic Measurements......Page 241
10.2.1 The Model......Page 245
10.2.2 Finding the Posterior Distribution......Page 246
10.2.3 Choosing the Measurement......Page 248
Problems......Page 251
11 Optimal Bidding......Page 253
11.1.1 Some Valuation Models......Page 255
11.1.2 The Logit Model......Page 256
11.2.1 A Conjugate Prior for Choosing Between Two Demand Curves......Page 259
11.2.2 Moment Matching for Nonconjugate Problems......Page 261
11.2.3 An Approximation for the Logit Model......Page 264
11.3 Bidding Strategies......Page 266
11.3.2 Bayes
Greedy Bidding......Page 267
11.3.3 Numerical Illustrations......Page 269
11.4.1 Moment Matching for Pareto Prior......Page 273
11.4.2 Approximating the Logistic Expectation......Page 274
11.5 Bibliographic Notes......Page 275
Problems......Page 276
12.1 Sequential Probability Ratio Test......Page 277
12.2.1 Setup......Page 283
12.2.2 Solution......Page 284
Problems......Page 288
13 Active Learning in Statistics......Page 291
13.1 Deterministic Policies......Page 292
13.2.1 Uncertainty Sampling......Page 296
13.2.2 Query by Committee......Page 297
13.3 A Variance-Minimizing Policy......Page 299
13.4.1 Estimating Parameters......Page 302
13.4.2 Active Learning......Page 304
13.5 Bibliographic Notes......Page 305
14 Simulation Optimization......Page 307
14.1.1 Batch Procedures......Page 310
14.1.2 Sequential Procedures......Page 312
14.1.3 The 0
1 Procedure: Connection to Linear Loss......Page 314
14.2.1 Indifference-Zone Version......Page 315
14.2.3 When Does It Work?......Page 317
14.3 Model-Based Simulated Annealing......Page 318
14.4 Other Areas of Simulation Optimization......Page 320
14.5 Bibliographic Notes......Page 321
15 Learning in Mathematical Programming......Page 323
15.1.1 Piloting a Hot Air Balloon......Page 325
15.1.2 Optimizing a Portfolio......Page 330
15.1.3 Network Problems......Page 331
15.2 Learning on Graphs......Page 335
15.3 Alternative Edge Selection Policies......Page 339
15.4 Learning Costs for Linear Programs*......Page 340
15.5 Bibliographic Notes......Page 346
16 Optimizing Over Continuous Measurements......Page 347
16.1 The Belief Model......Page 349
16.1.1 Updating Equations......Page 350
16.1.2 Parameter Estimation......Page 352
16.2 Sequential Kriging Optimization......Page 354
16.3.1 Maximizing the Knowledge Gradient......Page 356
16.3.2 Approximating the Knowledge Gradient......Page 357
16.3.3 The Gradient of the Knowledge Gradient......Page 358
16.3.4 Maximizing the Knowledge Gradient......Page 360
16.3.5 The KGCP Policy......Page 361
16.4 Efficient Global Optimization......Page 362
16.5 Experiments......Page 363
16.6 Extension to Higher-Dimensional Problems......Page 364
16.7 Bibliographic Notes......Page 365
17 Learning With a Physical State......Page 367
17.1 Introduction to Dynamic Programming......Page 369
17.1.1 Approximate Dynamic Programming......Page 370
17.1.2 The Exploration vs. Exploitation Problem......Page 372
17.1.3 Discussion......Page 373
17.2 Some Heuristic Learning Policies......Page 374
17.3 The Local Bandit Approximation......Page 375
17.4.1 Generalized Learning Using Basis Functions......Page 377
17.4.2 The Knowledge Gradient......Page 380
17.4.3 Experiments......Page 383
17.5 An Expected Improvement Policy......Page 385
17.6 Bibliographic Notes......Page 386
Index......Page 403