Solving Sokoban using Hierarchical Reinforcement Learning with Landmarks

6 April 2025

Abstract

We introduce a novel hierarchical reinforcement learning (HRL) framework that performs top-down recursive planning via learned subgoals, successfully applied to the complex combinatorial puzzle game Sokoban. Our approach constructs a six-level policy hierarchy, where each higher-level policy generates subgoals for the level below. All subgoals and policies are learned end-to-end from scratch, without any domain knowledge. Our results show that the agent can generate long action sequences from a single high-level call. While prior work has explored 2-3 level hierarchies and subgoal-based planning heuristics, we demonstrate that deep recursive goal decomposition can emerge purely from learning, and that such hierarchies can scale effectively to hard puzzle domains.

View on arXiv

@article{pastukhov2025_2504.04366,
  title={ Solving Sokoban using Hierarchical Reinforcement Learning with Landmarks },
  author={ Sergey Pastukhov },
  journal={arXiv preprint arXiv:2504.04366},
  year={ 2025 }
}

Comments on this paper