ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.04524
  4. Cited By
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning

Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning

6 April 2025
Xuerui Su
Shufang Xie
Guoqing Liu
Yingce Xia
Renqian Luo
Peiran Jin
Zhiming Ma
Yue Wang
Zun Wang
Yuting Liu
    LRM
ArXivPDFHTML

Papers citing "Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning"

1 / 1 papers shown
Title
Learning to Reason under Off-Policy Guidance
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
36
0
0
21 Apr 2025
1