487

Best-of-Both-Worlds Algorithms for Partial Monitoring

International Conference on Algorithmic Learning Theory (ALT), 2022
Abstract

This paper considers the partial monitoring problem with kk-actions and dd-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are bounded poly-logarithmically in the stochastic regime and near-optimally in the adversarial regime. To be more specific, we show that for non-degenerate locally observable games, the regret in the stochastic regime is bounded by O(k3m2log(T)log(kΠT)/Δmin)O(k^3 m^2 \log(T) \log(k_{\Pi} T) / \Delta_{\mathrm{\min}}) and in the adversarial regime by O(k2/3mTlog(T)logkΠ)O(k^{2/3} m \sqrt{T \log(T) \log k_{\Pi}}), where TT is the number of rounds, mm is the maximum number of distinct observations per action, Δmin\Delta_{\min} is the minimum optimality gap, and kΠk_{\Pi} is the number of Pareto optimal actions. Moreover, we show that for non-degenerate globally observable games, the regret in the stochastic regime is bounded by O(max{cG2/k,cG}log(T)log(kΠT)/Δmin2)O(\max\{c_{\mathcal{G}}^2 / k,\, c_{\mathcal{G}}\} \log(T) \log(k_{\Pi} T) / \Delta_{\min}^2) and in the adversarial regime by O((max{cG2/k,cG}log(T)log(kΠT)))1/3T2/3)O((\max\{c_{\mathcal{G}}^2 / k,\, c_{\mathcal{G}}\} \log(T) \log(k_{\Pi} T)))^{1/3} T^{2/3}), where cGc_{\mathcal{G}} is a game-dependent constant. Our algorithms are based on the follow-the-regularized-leader framework that takes into account the nature of the partial monitoring problem, inspired by algorithms in the field of online learning with feedback graphs.

View on arXiv
Comments on this paper