Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

Abstract
We prove that the information-theoretic upper bound on the minimax regret for adversarial bandit convex optimisation is at most , improving on by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.
View on arXivComments on this paper
