ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.06048
26
0

Trust-Region Twisted Policy Improvement

8 April 2025
Joery A. de Vries
Jinke He
Yaniv Oren
M. Spaan
    OffRL
    LRM
ArXivPDFHTML
Abstract

Monte-Carlo tree search (MCTS) has driven many recent breakthroughs in deep reinforcement learning (RL). However, scaling MCTS to parallel compute has proven challenging in practice which has motivated alternative planners like sequential Monte-Carlo (SMC). Many of these SMC methods adopt particle filters for smoothing through a reformulation of RL as a policy inference problem. Yet, persisting design choices of these particle filters often conflict with the aim of online planning in RL, which is to obtain a policy improvement at the start of planning. Drawing inspiration from MCTS, we tailor SMC planners specifically for RL by improving data generation within the planner through constrained action sampling and explicit terminal state handling, as well as improving policy and value target estimation. This leads to our Trust-Region Twisted SMC (TRT-SMC), which shows improved runtime and sample-efficiency over baseline MCTS and SMC methods in both discrete and continuous domains.

View on arXiv
@article{vries2025_2504.06048,
  title={ Trust-Region Twisted Policy Improvement },
  author={ Joery A. de Vries and Jinke He and Yaniv Oren and Matthijs T.J. Spaan },
  journal={arXiv preprint arXiv:2504.06048},
  year={ 2025 }
}
Comments on this paper