ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2512.03847
  4. Cited By
DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training

DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training

3 December 2025
Dingwei Zhu
Zhiheng Xi
Shihan Dou
Yuhui Wang
Sixian Li
Junjie Ye
Honglin Guo
Shichun Liu
Chenhao Huang
Yajie Yang
Junlin Shang
Senjie Jin
Ming Zhang
Jiazheng Zhang
Caishuang Huang
Yunke Zhang
Demei Yan
Yuran Wang
Tao Gui
    OffRL
ArXiv (abs)PDFHTML

Papers citing "DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training"

0 / 0 papers shown
Title

No papers found