430

Batched Lipschitz Bandits

Abstract

In this paper, we study the batched Lipschitz bandit problem, where the expected reward is Lipschitz and the reward observations are collected in batches. We introduce a novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), that naturally fits into the batched feedback setting. In particular, we show that for a TT-step problem with Lipschitz reward of zooming dimension dzd_z, our algorithm achieves theoretically optimal regret rate of $ \widetilde{\mathcal{O}} \left( T^{\frac{d_z + 1}{d_z + 2}} \right) $ using only $ \mathcal{O} \left( \log\log T\right) $ batches. For the lower bound, we show that in an environment with BB-batches, for any policy π\pi, there exists a problem instance such that the expected regret is lower bounded by $ \widetilde{\Omega} \left(R_z(T)^\frac{1}{1-\left(\frac{1}{d+2}\right)^B}\right) $, where Rz(T)R_z (T) is the regret lower bound for vanilla Lipschitz bandits that depends on the zooming dimension dzd_z, and dd is the dimension of the arm space. As a direct consequence, B=Ω(loglogT)B=\Omega(\log\log T) batches are needed to achieve the regret lower bound, and BLiN algorithm is optimal.

View on arXiv
Comments on this paper