Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2508.17445
Cited By
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
24 August 2025
Yi Zhou
Qingshui Gu
Zhoufutu Wen
Ziniu Li
Tianshun Xing
Shuyue Guo
Tianyu Zheng
Xin Zhou
Xingwei Qu
Wangchunshu Zhou
Zheng Zhang
Wei Shen
Qian Liu
C. D. Lin
Jian Yang
G. Zhang
Wenhao Huang
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (75 upvotes)
Papers citing
"TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling"
17 / 17 papers shown
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation
Speed Zhu
Jianwei Cai
Guang Chen
Lulu Wu
Saiyong Yang
Wiggin Zhou
OffRL
LRM
97
0
0
09 Nov 2025
Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards
Shangyu Xing
Siyuan Wang
Chenyuan Yang
Xinyu Dai
Xiang Ren
129
0
0
28 Oct 2025
Teaching Language Models to Reason with Tools
Chengpeng Li
Zhengyang Tang
Ziniu Li
Mingfeng Xue
Keqin Bao
...
Ruoyu Sun
Benyou Wang
Xiang Wang
Junyang Lin
Dayiheng Liu
LLMAG
LRM
101
2
0
23 Oct 2025
Agentic Entropy-Balanced Policy Optimization
Guanting Dong
Licheng Bao
Zhongyuan Wang
Kangzhi Zhao
Xiaoxi Li
...
Kun Gai
Guorui Zhou
Yutao Zhu
Ji-Rong Wen
Zhicheng Dou
92
2
0
16 Oct 2025
Arbitrary Entropy Policy Optimization Breaks The Exploration Bottleneck of Reinforcement Learning
Chen Wang
Ruoyao Xiao
Jionghao Bai
Yuzhi Zhang
Shisheng Cui
Zhou Zhao
Yue Wang
367
0
0
09 Oct 2025
Reinforce-Ada: An Adaptive Sampling Framework under Non-linear RL Objectives
Wei Xiong
Chenlu Ye
Baohao Liao
Hanze Dong
Xinxing Xu
Christof Monz
Jiang Bian
Nan Jiang
Tong Zhang
LRM
300
1
0
06 Oct 2025
Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning
Chenghao Yang
Lin Gui
Chenxiao Yang
Victor Veitch
Lizhu Zhang
Zhuokai Zhao
OffRL
182
0
0
06 Oct 2025
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
Ruohao Guo
Afshin Oroojlooy
Roshan Sridhar
Miguel Ballesteros
Alan Ritter
Dan Roth
AAML
158
0
0
02 Oct 2025
Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning
Zhihao Dou
Qinjian Zhao
Zhongwei Wan
Dinggen Zhang
Weida Wang
...
Qingtao Pan
Yang Ouyang
Zhiqiang Gao
Shufei Zhang
Sumon Biswas
LLMAG
LRM
170
2
0
02 Oct 2025
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
Runze Liu
Jiakang Wang
Yuling Shi
Zhihui Xie
Chenxin An
...
Wenping Hu
Xiu Li
Fuzheng Zhang
Guorui Zhou
Kun Gai
OffRL
LRM
149
3
0
30 Sep 2025
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
Ziniu Li
C. L. Philip Chen
Tianyun Yang
Tian Ding
Ruoyu Sun
G. Zhang
Wenhao Huang
Zhi-Quan Luo
168
1
0
30 Sep 2025
FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning
Yizhou Zhang
Ning Lv
T. Wang
Jisheng Dang
OffRL
LRM
128
1
0
26 Sep 2025
Tree Search for LLM Agent Reinforcement Learning
Yuxiang Ji
Ziyu Ma
Yong Wang
Guanhua Chen
Xiangxiang Chu
Liaoni Wu
168
3
0
25 Sep 2025
Exploiting Tree Structure for Credit Assignment in RL Training of LLMs
Hieu Tran
Zonghai Yao
Hong-ye Yu
OffRL
191
3
0
22 Sep 2025
Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning
Bingning Huang
Tu Nguyen
Matthieu Zimmer
OffRL
LRM
249
3
0
11 Sep 2025
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
Yuming Li
Y. Wang
Yuying Zhu
Zhongyu Zhao
Ming Lu
Qi She
Shanghang Zhang
281
17
0
07 Sep 2025
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning
Zhongwei Wan
Zhihao Dou
Che Liu
Yu Zhang
Dongfei Cui
...
Yifan Jiang
Yangfan He
Mi Zhang
Shen Yan
Shen Yan
LRM
355
29
0
02 Jun 2025
1