39
17

Logarithmic Regret from Sublinear Hints

Abstract

We consider the online linear optimization problem, where at every step the algorithm plays a point xtx_t in the unit ball, and suffers loss ct,xt\langle c_t, x_t\rangle for some cost vector ctc_t that is then revealed to the algorithm. Recent work showed that if an algorithm receives a hint hth_t that has non-trivial correlation with ctc_t before it plays xtx_t, then it can achieve a regret guarantee of O(logT)O(\log T), improving on the bound of Θ(T)\Theta(\sqrt{T}) in the standard setting. In this work, we study the question of whether an algorithm really requires a hint at every time step. Somewhat surprisingly, we show that an algorithm can obtain O(logT)O(\log T) regret with just O(T)O(\sqrt{T}) hints under a natural query model; in contrast, we also show that o(T)o(\sqrt{T}) hints cannot guarantee better than Ω(T)\Omega(\sqrt{T}) regret. We give two applications of our result, to the well-studied setting of optimistic regret bounds and to the problem of online learning with abstention.

View on arXiv
Comments on this paper