ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.05613
  4. Cited By
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models

Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models

7 August 2025
Haitao Hong
Yuchen Yan
Xingyu Wu
Guiyang Hou
Wenqi Zhang
Weiming Lu
Yongliang Shen
Jun Xiao
    LRM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (23★)

Papers citing "Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models"

1 / 1 papers shown
Title
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
Yuchen Yan
Jin Jiang
Zhenbang Ren
Yijun Li
Xudong Cai
...
Mengdi Zhang
Jian Shao
Yongliang Shen
Jun Xiao
Yueting Zhuang
OffRLALMLRM
231
6
0
21 May 2025
1