ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2509.24494
  4. Cited By
GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training
v1v2 (latest)

GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training

29 September 2025
Hongcheng Wang
Yinuo Huang
Sukai Wang
Guanghui Ren
Hao Dong
    LRM
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)

Papers citing "GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training"

2 / 2 papers shown
Title
GAPO: Robust Advantage Estimation for Real-World Code LLMs
GAPO: Robust Advantage Estimation for Real-World Code LLMs
Jianqing Zhang
Zhezheng Hao
Wei Xia
Hande Dong
Hong Wang
Chenxing Wei
Yuyan Zhou
Yubin Qi
Qiang Lin
Jian Cao
186
0
0
22 Oct 2025
RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning
RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning
Jinrui Liu
Bingyan Nie
Boyu Li
Yaran Chen
Yuze Wang
Shunsen He
Haoran Li
LRM
233
0
0
16 Oct 2025
1