Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2506.20520
Cited By
v1
v2 (latest)
Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
25 June 2025
Charles Arnal
Gaëtan Narozniak
Vivien A. Cabannes
Yunhao Tang
Julia Kempe
Rémi Munos
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards"
10 / 10 papers shown
Fast LLM Post-training via Decoupled and Fastest-of-N Speculation
Rongxin Cheng
Kai Zhou
Xingda Wei
Siyuan Liu
Mingcong Han
...
Yeju Zhou
Baoquan Zhong
W. L. Xiao
Rong Chen
Haibo Chen
OffRL
LRM
456
0
0
24 Dec 2025
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Zhiheng Xi
Xin Guo
Yang Nan
Enyu Zhou
Junrui Shen
...
Rui Zheng
Hang Yan
Tao Gui
Qi Zhang
Xuanjing Huang
OffRL
183
8
0
21 Oct 2025
On the optimization dynamics of RLVR: Gradient gap and step size thresholds
Joe Suk
Yaqi Duan
186
0
0
09 Oct 2025
ExGRPO: Learning to Reason from Experience
Runzhe Zhan
Yafu Li
Zhi Wang
Xiaoye Qu
Dongrui Liu
Jing Shao
Derek F. Wong
Yu Cheng
OffRL
LRM
145
3
1
02 Oct 2025
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
Haizhong Zheng
Jiawei Zhao
Bedi Chen
OffRL
161
10
0
01 Oct 2025
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
Chaorui Yao
Yanxi Chen
Yuchang Sun
Yushuo Chen
Wenhao Zhang
Xuchen Pan
Yaliang Li
Bolin Ding
OffRL
113
4
0
29 Sep 2025
Quantile Advantage Estimation for Entropy-Safe Reasoning
Junkang Wu
Kexin Huang
Jiancan Wu
An Zhang
Xiang Wang
Xiangnan He
143
4
0
26 Sep 2025
Outcome-based Exploration for LLM Reasoning
Yuda Song
Julia Kempe
Remi Munos
OffRL
LRM
282
38
0
08 Sep 2025
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Wenhao Zhang
Yuexiang Xie
Yuchang Sun
Yanxi Chen
Guoyin Wang
Yaliang Li
Bolin Ding
Jingren Zhou
OffRL
210
33
0
15 Aug 2025
Transforming Calabi-Yau Constructions: Generating New Calabi-Yau Manifolds with Transformers
Jacky H. T. Yip
Charles Arnal
Francois Charton
G. Shiu
155
3
0
04 Jul 2025
1
Page 1 of 1