Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2509.09284
Cited By

Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning

v1v2v3 (latest)

Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning

11 September 2025

Matthieu Zimmer

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning"

2 / 2 papers shown

Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards

Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards

...

129

1

0

30 Nov 2025

Exploiting Tree Structure for Credit Assignment in RL Training of LLMs

Exploiting Tree Structure for Credit Assignment in RL Training of LLMs

202

3

0

22 Sep 2025

Page 1 of 1