14
1

Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring

Abstract

Partial monitoring is a generic framework of online decision-making problems with limited observations. To make decisions from such limited observations, it is necessary to find an appropriate distribution for exploration. Recently, a powerful approach for this purpose, exploration by optimization (ExO), was proposed, which achieves the optimal bounds in adversarial environments with follow-the-regularized-leader for a wide range of online decision-making problems. However, a naive application of ExO in stochastic environments significantly degrades regret bounds. To resolve this problem in locally observable games, we first establish a novel framework and analysis for ExO with a hybrid regularizer. This development allows us to significantly improve the existing regret bounds of best-of-both-worlds (BOBW) algorithms, which achieves nearly optimal bounds both in stochastic and adversarial environments. In particular, we derive a stochastic regret bound of O(aak2m2logT/Δa)O(\sum_{a \neq a^*} k^2 m^2 \log T / \Delta_a), where kk, mm, and TT are the numbers of actions, observations and rounds, aa^* is an optimal action, and Δa\Delta_a is the suboptimality gap for action aa. This bound is roughly Θ(k2logT)\Theta(k^2 \log T) times smaller than existing BOBW bounds. In addition, for globally observable games, we provide a new BOBW algorithm with the first O(logT)O(\log T) stochastic bound.

View on arXiv
Comments on this paper