Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

31 May 2020

Abstract

We prove that the information-theoretic upper bound on the minimax regret for adversarial bandit convex optimisation is at most $O(d^3 \sqrt{n} \log(n))$ , improving on $O(d^{9.5} \sqrt{n} \log(n)^{7.5})$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

View on arXiv

Comments on this paper