Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2509.09284
Cited By
v1
v2
v3 (latest)
Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning
11 September 2025
Bingning Huang
Tu Nguyen
Matthieu Zimmer
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning"
2 / 2 papers shown
Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards
Qiang Lyu
Z. Chen
C. Wang
Haolin Shi
Shibo Gao
...
Jianlou Si
Fei Ding
Jing Li
Chun Pong Lau
Weiqiang Wang
EGVM
129
1
0
30 Nov 2025
Exploiting Tree Structure for Credit Assignment in RL Training of LLMs
Hieu Tran
Zonghai Yao
Hong-ye Yu
OffRL
202
3
0
22 Sep 2025
1
Page 1 of 1