49

Stochastic Shortest Path with Sparse Adversarial Costs

Main:9 Pages
5 Figures
Bibliography:3 Pages
Appendix:23 Pages
Abstract

We study the adversarial Stochastic Shortest Path (SSP) problem with sparse costs under full-information feedback. In the known transition setting, existing bounds based on Online Mirror Descent (OMD) with negative-entropy regularization scale with logSA\sqrt{\log S A}, where SASA is the size of the state-action space. While we show that this is optimal in the worst-case, this bound fails to capture the benefits of sparsity when only a small number MSAM \ll SA of state-action pairs incur cost. In fact, we also show that the negative-entropy is inherently non-adaptive to sparsity: it provably incurs regret scaling with logS\sqrt{\log S} on sparse problems. Instead, we propose a family of r\ell_r-norm regularizers (r(1,2)r \in (1,2)) that adapts to the sparsity and achieves regret scaling with logM\sqrt{\log M} instead of logSA\sqrt{\log SA}. We show this is optimal via a matching lower bound, highlighting that MM captures the effective dimension of the problem instead of SASA. Finally, in the unknown transition setting the benefits of sparsity are limited: we prove that even on sparse problems, the minimax regret for any learner scales polynomially with SASA.

View on arXiv
Comments on this paper