Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework

27 June 2020

Abstract

This paper studies a continuous-time linear quadratic reinforcement learning problem in an episodic setting. We first show that na\"ive discretization and piecewise approximation with discrete-time RL algorithms yields a linear regret with respect to the number of learning episodes $N$ . We then propose an algorithm with continuous-time controls based on a regularized least-squares estimation. We establish a sublinear regret bound in the order of $\tilde O(N^{9/10})$ . The analysis consists of two parts: parameter estimation error, which relies on properties of sub-exponential random variables and double stochastic integrals; and perturbation analysis, which establishes the robustness of the associated continuous-time Riccati equation by exploiting its regularity property. The regret bound for the one-dimensional case improves to $\tilde O(\sqrt{N})$ .

View on arXiv

Comments on this paper