Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.22257
Cited By
v1
v2 (latest)
Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
28 May 2025
Youssef Mroueh
Nicolas Dupuis
Brian M. Belgodere
Apoorva Nitsure
Mattia Rigotti
Kristjan Greenewald
Jirí Navrátil
Jerret Ross
Jesus Rios
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training"
4 / 4 papers shown
Title
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Min Lin
OffRL
LRM
228
172
0
26 Mar 2025
Reinforcement Learning with Verifiable Rewards: GRPO's Effective Loss, Dynamics, and Success Amplification
Youssef Mroueh
OffRL
172
13
0
09 Mar 2025
What is the Alignment Objective of GRPO?
Milan Vojnovic
Se-Young Yun
138
5
0
25 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
392
2,024
0
22 Jan 2025
1