56
0

Lower Bounds for γγ-Regret via the Decision-Estimation Coefficient

Abstract

In this note, we give a new lower bound for the γ\gamma-regret in bandit problems, the regret which arises when comparing against a benchmark that is γ\gamma times the optimal solution, i.e., Regγ(T)=t=1Tγmaxπf(π)f(πt)\mathsf{Reg}_{\gamma}(T) = \sum_{t = 1}^T \gamma \max_{\pi} f(\pi) - f(\pi_t). The γ\gamma-regret arises in structured bandit problems where finding an exact optimum of ff is intractable. Our lower bound is given in terms of a modification of the constrained Decision-Estimation Coefficient (DEC) of~\citet{foster2023tight} (and closely related to the original offset DEC of \citet{foster2021statistical}), which we term the γ\gamma-DEC. When restricted to the traditional regret setting where γ=1\gamma = 1, our result removes the logarithmic factors in the lower bound of \citet{foster2023tight}.

View on arXiv
Comments on this paper