ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.22257
  4. Cited By
Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
v1v2 (latest)

Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training

28 May 2025
Youssef Mroueh
Nicolas Dupuis
Brian M. Belgodere
Apoorva Nitsure
Mattia Rigotti
Kristjan Greenewald
Jirí Navrátil
Jerret Ross
Jesus Rios
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training"

4 / 4 papers shown
Title
Understanding R1-Zero-Like Training: A Critical Perspective
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Min Lin
OffRLLRM
228
172
0
26 Mar 2025
Reinforcement Learning with Verifiable Rewards: GRPO's Effective Loss, Dynamics, and Success Amplification
Reinforcement Learning with Verifiable Rewards: GRPO's Effective Loss, Dynamics, and Success Amplification
Youssef Mroueh
OffRL
172
13
0
09 Mar 2025
What is the Alignment Objective of GRPO?
What is the Alignment Objective of GRPO?
Milan Vojnovic
Se-Young Yun
138
5
0
25 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
392
2,024
0
22 Jan 2025
1