270
v1v2v3 (latest)

Best-of-Both-Worlds Algorithms for Partial Monitoring

International Conference on Algorithmic Learning Theory (ALT), 2022
Abstract

This study considers the partial monitoring problem with kk-actions and dd-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is O(m2k4log(T)log(kΠT)/Δmin)O(m^2 k^4 \log(T) \log(k_{\Pi} T) / \Delta_{\min}) in the stochastic regime and O(mk2/3Tlog(T)logkΠ)O(m k^{2/3} \sqrt{T \log(T) \log k_{\Pi}}) in the adversarial regime, where TT is the number of rounds, mm is the maximum number of distinct observations per action, Δmin\Delta_{\min} is the minimum suboptimality gap, and kΠk_{\Pi} is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is O(cG2log(T)log(kΠT)/Δmin2)O(c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T) / \Delta_{\min}^2) in the stochastic regime and O((cG2log(T)log(kΠT))1/3T2/3)O((c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T))^{1/3} T^{2/3}) in the adversarial regime, where cGc_{\mathcal{G}} is a game-dependent constant. We also provide regret bounds for a stochastic regime with adversarial corruptions. Our algorithms are based on the follow-the-regularized-leader framework and are inspired by the approach of exploration by optimization and the adaptive learning rate in the field of online learning with feedback graphs.

View on arXiv
Comments on this paper