ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.01956
91
0

DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents

4 February 2025
Shashank Sharma
Janina Hoffmann
Vinay P. Namboodiri
ArXivPDFHTML
Abstract

In this paper, we address the challenge of long-horizon visual planning tasks using Hierarchical Reinforcement Learning (HRL). Our key contribution is a Discrete Hierarchical Planning (DHP) method, an alternative to traditional distance-based approaches. We provide theoretical foundations for the method and demonstrate its effectiveness through extensive empirical evaluations.Our agent recursively predicts subgoals in the context of a long-term goal and receives discrete rewards for constructing plans as compositions of abstract actions. The method introduces a novel advantage estimation strategy for tree trajectories, which inherently encourages shorter plans and enables generalization beyond the maximum tree depth. The learned policy function allows the agent to plan efficiently, requiring only log⁡N\log NlogN computational steps, making re-planning highly efficient. The agent, based on a soft-actor critic (SAC) framework, is trained using on-policy imagination data. Additionally, we propose a novel exploration strategy that enables the agent to generate relevant training examples for the planning modules. We evaluate our method on long-horizon visual planning tasks in a 25-room environment, where it significantly outperforms previous benchmarks at success rate and average episode length. Furthermore, an ablation study highlights the individual contributions of key modules to the overall performance.

View on arXiv
@article{sharma2025_2502.01956,
  title={ DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents },
  author={ Shashank Sharma and Janina Hoffmann and Vinay Namboodiri },
  journal={arXiv preprint arXiv:2502.01956},
  year={ 2025 }
}
Comments on this paper