Exploration by Optimisation in Partial Monitoring
Annual Conference Computational Learning Theory (COLT), 2019

Abstract
We provide a simple and efficient algorithm for adversarial -action -outcome non-degenerate locally observable partial monitoring games for which the -round minimax regret is bounded by , matching the best known information-theoretic upper bounds.
View on arXivComments on this paper
