90
42

Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework

Abstract

This paper studies a continuous-time linear quadratic reinforcement learning problem in an episodic setting. We first show that na\"ive discretization and piecewise approximation with discrete-time RL algorithms yields a linear regret with respect to the number of learning episodes NN. We then propose an algorithm with continuous-time controls based on a regularized least-squares estimation. We establish a sublinear regret bound in the order of O~(N9/10)\tilde O(N^{9/10}). The analysis consists of two parts: parameter estimation error, which relies on properties of sub-exponential random variables and double stochastic integrals; and perturbation analysis, which establishes the robustness of the associated continuous-time Riccati equation by exploiting its regularity property. The regret bound for the one-dimensional case improves to O~(N)\tilde O(\sqrt{N}).

View on arXiv
Comments on this paper