9
15

Optimal Dynamic Regret in LQR Control

Abstract

We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of O~(max{n1/3TV(M1:n)2/3,1})\tilde{O}(\text{max}\{n^{1/3} \mathcal{TV}(M_{1:n})^{2/3}, 1\}), where TV(M1:n)\mathcal{TV}(M_{1:n}) is the total variation of any oracle sequence of Disturbance Action policies parameterized by M1,...,MnM_1,...,M_n -- chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of O~(n(TV(M1:n)+1))\tilde{O}(\sqrt{n (\mathcal{TV}(M_{1:n})+1)} ) for general convex losses and we prove that it is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster and Simchowitz (2020), as well as a new proper learning algorithm with an optimal O~(n1/3)\tilde{O}(n^{1/3}) dynamic regret on a family of ``minibatched'' quadratic losses, which could be of independent interest.

View on arXiv
Comments on this paper