ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2509.09284
  4. Cited By
Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning
v1v2v3 (latest)

Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning

11 September 2025
Bingning Huang
Tu Nguyen
Matthieu Zimmer
    OffRLLRM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning"

2 / 2 papers shown
Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards
Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards
Qiang Lyu
Z. Chen
C. Wang
Haolin Shi
Shibo Gao
...
Jianlou Si
Fei Ding
Jing Li
Chun Pong Lau
Weiqiang Wang
EGVM
129
1
0
30 Nov 2025
Exploiting Tree Structure for Credit Assignment in RL Training of LLMs
Exploiting Tree Structure for Credit Assignment in RL Training of LLMs
Hieu Tran
Zonghai Yao
Hong-ye Yu
OffRL
202
3
0
22 Sep 2025
1
Page 1 of 1