Batched Lipschitz Bandits
In this paper, we study the batched Lipschitz bandit problem, where the expected reward is Lipschitz and the reward observations are collected in batches. We introduce a novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), that naturally fits into the batched feedback setting. In particular, we show that for a -step problem with Lipschitz reward of zooming dimension , our algorithm achieves theoretically optimal regret rate of $ \widetilde{\mathcal{O}} \left( T^{\frac{d_z + 1}{d_z + 2}} \right) $ using only $ \mathcal{O} \left( \log\log T\right) $ batches. For the lower bound, we show that in an environment with -batches, for any policy , there exists a problem instance such that the expected regret is lower bounded by $ \widetilde{\Omega} \left(R_z(T)^\frac{1}{1-\left(\frac{1}{d+2}\right)^B}\right) $, where is the regret lower bound for vanilla Lipschitz bandits that depends on the zooming dimension , and is the dimension of the arm space. As a direct consequence, batches are needed to achieve the regret lower bound, and BLiN algorithm is optimal.
View on arXiv