91

On the hardness of RL with Lookahead

Main:8 Pages
1 Figures
Bibliography:2 Pages
Appendix:13 Pages
Abstract

We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of \ell actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead (=1\ell=1) can be solved in polynomial time through a novel linear programming formulation. In contrast, for 2\ell \geq 2, the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.

View on arXiv
Comments on this paper