Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.04524
Cited By
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
6 April 2025
Xuerui Su
Shufang Xie
Guoqing Liu
Yingce Xia
Renqian Luo
Peiran Jin
Zhiming Ma
Yue Wang
Zun Wang
Yuting Liu
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning"
1 / 1 papers shown
Title
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
39
0
0
21 Apr 2025
1