Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2509.24494
Cited By
v1
v2 (latest)
GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training
29 September 2025
Hongcheng Wang
Yinuo Huang
Sukai Wang
Guanghui Ren
Hao Dong
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (8 upvotes)
Papers citing
"GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training"
2 / 2 papers shown
Title
GAPO: Robust Advantage Estimation for Real-World Code LLMs
Jianqing Zhang
Zhezheng Hao
Wei Xia
Hande Dong
Hong Wang
Chenxing Wei
Yuyan Zhou
Yubin Qi
Qiang Lin
Jian Cao
186
0
0
22 Oct 2025
RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning
Jinrui Liu
Bingyan Nie
Boyu Li
Yaran Chen
Yuze Wang
Shunsen He
Haoran Li
LRM
233
0
0
16 Oct 2025
1