Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework

This paper studies a continuous-time linear quadratic reinforcement learning problem in an episodic setting. We first show that na\"ive discretization and piecewise approximation with discrete-time RL algorithms yields a linear regret with respect to the number of learning episodes . We then propose an algorithm with continuous-time controls based on a regularized least-squares estimation. We establish a sublinear regret bound in the order of . The analysis consists of two parts: parameter estimation error, which relies on properties of sub-exponential random variables and double stochastic integrals; and perturbation analysis, which establishes the robustness of the associated continuous-time Riccati equation by exploiting its regularity property. The regret bound for the one-dimensional case improves to .
View on arXiv