Algorithms and Bounds for Sampling-based Approximate Policy Iteration

14 May 2008

Christos Dimitrakakis

Michail G. Lagoudakis

Abstract

Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem, have been proposed recently. Finding good policies with such methods requires not only an appropriate classifier, but also reliable examples for the best actions, covering all of the state space. One major question is how to find a good covering efficiently. However, up to this time, little work has been done to reduce the sample complexity of such methods, especially in continuous state spaces. This paper focuses on the simplest possible classification strategy / policy representation for such spaces (a discretised grid) and performs a sample-complexity comparison between previously the simplest (and commonly) sample allocation strategy, which allocates samples equally at each state under consideration, and an almost as simple method, which is shown to require significantly fewer samples.

View on arXiv

Comments on this paper