
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
Baolin Peng
Hao Cheng
Xuehai He
Kuan-Chieh Jackson Wang
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
Papers citing "Reinforcement Learning for Reasoning in Large Language Models with One Training Example"
1 / 1 papers shown