Communities
Connect sessions
AI calendar
Organizations
Contact Sales
Search
Open menu
Home
Papers
2508.21104
Cited By
v1
v2
v3 (latest)
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning
28 August 2025
Wenfeng Feng
Penghong Zhao
Guochao Jiang
Chuzhan Hao
Yuewei Zhang
Guohua Liu
Hao Wang
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (26 upvotes)
Github (1197★)
Papers citing
"PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning"
2 / 2 papers shown
Title
ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
Zihan Lin
Xiaohan Wang
Jie Cao
Jiajun Chai
Guojun Yin
Wei Lin
Ran He
43
0
0
26 Sep 2025
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models
Guochao Jiang
Wenfeng Feng
Guofeng Quan
Chuzhan Hao
Yuewei Zhang
Guohua Liu
Hao Wang
OffRL
LRM
40
1
0
24 Sep 2025
1