ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.21104
  4. Cited By
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning
v1v2v3 (latest)

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

28 August 2025
Wenfeng Feng
Penghong Zhao
Guochao Jiang
Chuzhan Hao
Yuewei Zhang
Guohua Liu
Hao Wang
    OffRL
ArXiv (abs)PDFHTMLHuggingFace (26 upvotes)Github (1197★)

Papers citing "PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning"

2 / 2 papers shown
Title
ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
Zihan Lin
Xiaohan Wang
Jie Cao
Jiajun Chai
Guojun Yin
Wei Lin
Ran He
43
0
0
26 Sep 2025
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models
Guochao Jiang
Wenfeng Feng
Guofeng Quan
Chuzhan Hao
Yuewei Zhang
Guohua Liu
Hao Wang
OffRLLRM
40
1
0
24 Sep 2025
1