ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.04618
  4. Cited By

Better Process Supervision with Bi-directional Rewarding Signals

6 March 2025
Wenxiang Chen
Wei He
Zhiheng Xi
Honglin Guo
Boyang Hong
Jiazheng Zhang
Rui Zheng
Nijun Li
Tao Gui
Yun Li
Qi Zhang
Xuanjing Huang
    LRM
ArXivPDFHTML

Papers citing "Better Process Supervision with Bi-directional Rewarding Signals"

1 / 1 papers shown
Title
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
LRM
42
2
0
17 Mar 2025
1