Из серии Foundations and Trends in Machine Learning издательства NOWPress, 2008, -163 pp.
This paper describes a novel machine learning framework for solving sequential decision problems called Markov decision processes (MDPs) by iteratively computing low-dimensional representations and approximately optimal policies. A unified mathematical framework for learning representation and optimal control in MDPs is presented based on a class of singular operators called Laplacians, whose matrix representations have nonpositive off-diagonal elements and zero row sums. Exact solutions of discounted and average-reward MDPs are expressed in terms of a generalized spectral inverse of the Laplacian called the Drazin inverse. A generic algorithm called representation policy iteration (RPI) is presented which interleaves computing low-dimensional representations and approximately optimal policies. Two approaches for dimensionality reduction of MDPs are described based on geometric and reward-sensitive regularization, whereby low-dimensional representations are formed by diagonalization or dilation of Laplacian operators. Model-based and model-free variants of the RPI algorithm are presented; they are also compared experimentally on discrete and continuous MDPs. Some directions for future work are finally outlined.
Introduction
Sequential Decision Problems
Laplacian Operators and MDPs
Approximating Markov Decision Processes
Dimensionality Reduction Principles in MDPs
Basis Construction: Diagonalization Methods
Basis Construction: Dilation Methods
Model-Based Representation Policy Iteration
Basis Construction in Continuous MDPs
Model-Free Representation Policy Iteration
Related Work and Future Challenges