18
0

Finite-Sample Analysis of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

Abstract

Monte Carlo Exploring Starts (MCES), which aims to learn the optimal policy using only sample returns, is a simple and natural algorithm in reinforcement learning which has been shown to converge under various conditions. However, the convergence rate analysis for MCES-style algorithms in the form of sample complexity has received very little attention. In this paper we develop a finite sample bound for a modified MCES algorithm which solves the stochastic shortest path problem. To this end, we prove a novel result on the convergence rate of the policy iteration algorithm. This result implies that with probability at least 1δ1-\delta, the algorithm returns an optimal policy after O~(SAK3log31δ)\tilde{O}(SAK^3\log^3\frac{1}{\delta}) sampled episodes, where SS and AA denote the number of states and actions respectively, KK is a proxy for episode length, and O~\tilde{O} hides logarithmic factors and constants depending on the rewards of the environment that are assumed to be known.

View on arXiv
Comments on this paper