Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2508.17445
Cited By

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

24 August 2025

Wangchunshu Zhou

ArXiv (abs)PDF HTML HuggingFace (75 upvotes)

Papers citing "TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling"

17 / 17 papers shown

DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

97

0

0

09 Nov 2025

Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards

Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards

129

0

0

28 Oct 2025

Teaching Language Models to Reason with Tools

Teaching Language Models to Reason with Tools

...

101

2

0

23 Oct 2025

Agentic Entropy-Balanced Policy Optimization

Agentic Entropy-Balanced Policy Optimization

...

92

2

0

16 Oct 2025

Arbitrary Entropy Policy Optimization Breaks The Exploration Bottleneck of Reinforcement Learning

Arbitrary Entropy Policy Optimization Breaks The Exploration Bottleneck of Reinforcement Learning

367

0

0

09 Oct 2025

Reinforce-Ada: An Adaptive Sampling Framework under Non-linear RL Objectives

Reinforce-Ada: An Adaptive Sampling Framework under Non-linear RL Objectives

300

1

0

06 Oct 2025

Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning

Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning

182

0

0

06 Oct 2025

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

Afshin Oroojlooy

Miguel Ballesteros

158

0

0

02 Oct 2025

Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning

Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning

...

170

2

0

02 Oct 2025

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

...

149

3

0

30 Sep 2025

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

C. L. Philip Chen

168

1

0

30 Sep 2025

FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning

FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning

128

1

0

26 Sep 2025

Tree Search for LLM Agent Reinforcement Learning

Tree Search for LLM Agent Reinforcement Learning

168

3

0

25 Sep 2025

Exploiting Tree Structure for Credit Assignment in RL Training of LLMs

Exploiting Tree Structure for Credit Assignment in RL Training of LLMs

191

3

0

22 Sep 2025

Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning

Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning

Matthieu Zimmer

249

3

0

11 Sep 2025

BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models

BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models

Shanghang Zhang

281

17

0

07 Sep 2025

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

...

355

29

0

02 Jun 2025