25
0

Improved Regret Bounds for Bandits with Expert Advice

Abstract

In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order KTln(N/K)\sqrt{K T \ln(N/K)} for the worst-case regret, where KK is the number of actions, N>KN>K the number of experts, and TT the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of KT(lnN)/(lnK)\sqrt{K T (\ln N) / (\ln K)}. For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

View on arXiv
Comments on this paper