ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.09501
  4. Cited By
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
v1v2v3 (latest)

ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning

12 March 2025
Bo Liu
Yunxiang Li
Yangqiu Song
Hanjing Wang
Linyi Yang
Mark Schmidt
Jun Wang
Jun Wang
Weinan Zhang
Shuyue Hu
Ying Wen
    LLMAGKELMLRMAI4CE
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning"

40 / 40 papers shown
Title
Maestro: Learning to Collaborate via Conditional Listwise Policy Optimization for Multi-Agent LLMs
Maestro: Learning to Collaborate via Conditional Listwise Policy Optimization for Multi-Agent LLMsISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences (ISPRS Annals), 2025
Wei Yang
Jiacheng Pang
Shixuan Li
P. Bogdan
Stephen Tu
Jesse Thomason
LLMAG
320
1
0
08 Nov 2025
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
Zhiwei Zhang
Xiaomin Li
Yudi Lin
Hui Liu
Ramraj Chandradevan
...
Minhua Lin
Fali Wang
Xianfeng Tang
Qi He
Suhang Wang
LLMAGLRM
207
0
0
04 Nov 2025
Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games
Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games
Asrin Efe Yorulmaz
Tamer Basar
80
0
0
04 Nov 2025
SPICE: Self-Play In Corpus Environments Improves Reasoning
SPICE: Self-Play In Corpus Environments Improves Reasoning
Bo Liu
Chuanyang Jin
Seungone Kim
Weizhe Yuan
Wenting Zhao
Ilia Kulikov
Xian Li
Sainbayar Sukhbaatar
Jack Lanchantin
Jason Weston
ReLMLRM
166
6
0
28 Oct 2025
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Xiangyuan Xue
Yifan Zhou
G. Zhang
Zaibin Zhang
Y. Li
Chen Zhang
Z. Yin
Philip Torr
Wanli Ouyang
Lei Bai
LLMAG
97
2
0
09 Oct 2025
Searching Meta Reasoning Skeleton to Guide LLM Reasoning
Searching Meta Reasoning Skeleton to Guide LLM Reasoning
Ziying Zhang
Yaqing Wang
Quanming Yao
LRM
84
1
0
05 Oct 2025
AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning
AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning
Zhenyu Pan
Y. Zhang
Zhuo Liu
Y. Tang
Zeliang Zhang
...
Haoyang Fang
Manling Li
Chenliang Xu
Philip S. Yu
Han Liu
AAML
125
0
0
02 Oct 2025
Interactive Learning for LLM Reasoning
Interactive Learning for LLM Reasoning
Hehai Lin
Shilei Cao
Minzhi Li
Sudong Wang
Haotian Wu
Linyi Yang
Lixian Zhang
Chengwei Qin
LLMAGLRM
205
0
0
30 Sep 2025
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Yoonjeon Kim
Doohyuk Jang
Eunho Yang
ReLMAIFinLRM
122
0
0
26 Sep 2025
StyleBench: Evaluating thinking styles in Large Language Models
StyleBench: Evaluating thinking styles in Large Language Models
Junyu Guo
S. Gu
Ming Jin
C. Spanos
Javad Lavaei
LRM
1.1K
0
0
25 Sep 2025
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Fanqi Kong
Ruijie Zhang
Huaxiao Yin
Guibin Zhang
X. Zhang
Ziang Chen
Zhaowei Zhang
Xiaoyuan Zhang
Song-Chun Zhu
Xue Feng
AAML
256
0
0
17 Sep 2025
Reinforcement Learning Foundations for Deep Research Systems: A Survey
Reinforcement Learning Foundations for Deep Research Systems: A Survey
Wenjun Li
Z. Chen
Jingru Lin
Hannan Cao
Wei Han
...
Zhi Zhang
Kuicai Dong
Dexun Li
Chen Zhang
Yong Liu
OffRL
120
4
0
08 Sep 2025
Evo-MARL: Co-Evolutionary Multi-Agent Reinforcement Learning for Internalized Safety
Evo-MARL: Co-Evolutionary Multi-Agent Reinforcement Learning for Internalized Safety
Zhenyu Pan
Xicheng Zhang
Y. Zhang
Jianshu Zhang
Haozheng Luo
...
Dennis Wu
Hong-Yu Chen
Philip S. Yu
Manling Li
Han Liu
AAML
136
2
0
05 Aug 2025
RingMo-Agent: A Unified Remote Sensing Foundation Model for Multi-Platform and Multi-Modal Reasoning
RingMo-Agent: A Unified Remote Sensing Foundation Model for Multi-Platform and Multi-Modal Reasoning
Huiyang Hu
Peijin Wang
Yingchao Feng
Kaiwen Wei
Wenxin Yin
...
Hanbo Bi
Kaiyue Kang
Tong Ling
Kun Fu
Xian Sun
136
4
0
28 Jul 2025
Generalizable LLM Learning of Graph Synthetic Data with Post-training Alignment
Generalizable LLM Learning of Graph Synthetic Data with Post-training Alignment
Yizhuo Zhang
Heng Wang
Shangbin Feng
Zhaoxuan Tan
Xinyun Liu
Yulia Tsvetkov
OffRL
247
0
0
01 Jun 2025
Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems
Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems
J. Yang
M. Zhang
Yiqiao Jin
Hao Chen
Qingsong Wen
...
Yi He
Weijie Xu
James Evans
James Evans
Jindong Wang
LLMAGAI4CE
318
1
0
28 May 2025
The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants
The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants
Yiqun Zhang
Hao Li
Chenxu Wang
L. Chen
Qiaosheng Zhang
...
Xinrun Wang
Jia Xu
Mengwei He
Xuming He
Shuyue Hu
319
13
0
26 May 2025
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
Xuanming Zhang
Yuxuan Chen
Min-Hsuan Yeh
Yixuan Li
LLMAGAI4CE
239
5
0
25 May 2025
A Survey on Collaborative Mechanisms Between Large and Small Language Models
A Survey on Collaborative Mechanisms Between Large and Small Language Models
Yi Chen
JiaHao Zhao
HaoHao Han
303
8
0
12 May 2025
a1: Steep Test-time Scaling Law via Environment Augmented Generation
a1: Steep Test-time Scaling Law via Environment Augmented Generation
Shansong Liu
Shenghua Liu
Yiwei Wang
Baolong Bi
Yuyao Ge
Jun Wan
Yurong Wu
Xueqi Cheng
LRM
267
9
0
20 Apr 2025
Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey
Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey
Ahsan Bilal
Muhammad Ahmed Mohsin
Muhammad Umer
Muhammad Awais Khan Bangash
Muhammad Ali Jamshed
LLMAGLRMAI4CE
327
4
0
20 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
Cengiz Pehlevan
Samy Jelassi
Eran Malach
ReLMLRM
720
66
0
10 Apr 2025
RAISE: Reinforced Adaptive Instruction Selection For Large Language Models
RAISE: Reinforced Adaptive Instruction Selection For Large Language Models
Lv Qingsong
Yangning Li
Zihua Lan
Zishan Xu
Jiwei Tang
...
Wenhao Jiang
Wanshi Xu
Philip S. Yu
Hai-Tao Zheng
Philip S. Yu
400
2
0
09 Apr 2025
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
Jingyuan Zhang
Qi Wang
Xingguang Ji
Wenshu Fan
Yang Yue
Fuzheng Zhang
Di Zhang
Guorui Zhou
Kun Gai
LRM
406
18
0
08 Apr 2025
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
Jianhao Chen
Zishuo Xun
Bocheng Zhou
Han Qi
Qiaosheng Zhang
...
Wei Hu
Yuzhong Qu
W. Ouyang
Wanli Ouyang
Shuyue Hu
430
13
0
01 Apr 2025
Understanding R1-Zero-Like Training: A Critical Perspective
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Jialin Li
OffRLLRM
410
537
0
26 Mar 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRLReLMLRM
538
321
0
24 Mar 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
Ning Yang
Jun Wang
788
2
0
15 Mar 2025
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi
Ayush Chakravarthy
Anikait Singh
Nathan Lile
Noah D. Goodman
ReLMLRM
418
266
0
03 Mar 2025
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Tian Xie
Zitian Gao
Qingnan Ren
Haoming Luo
Yuqian Hong
Bryan Dai
Joey Zhou
Kai Qiu
Zhirong Wu
Chong Luo
ReLMOffRLLRM
294
153
0
21 Feb 2025
Preference Optimization for Reasoning with Pseudo Feedback
Preference Optimization for Reasoning with Pseudo FeedbackInternational Conference on Learning Representations (ICLR), 2024
Fangkai Jiao
Geyang Guo
Xingxing Zhang
Nancy F. Chen
Shafiq Joty
Furu Wei
LRM
360
32
0
17 Feb 2025
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
L. Yang
Zhaochen Yu
Tengjiao Wang
Mengdi Wang
ReLMLRMAI4CE
464
40
0
10 Feb 2025
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
Guanghao Ye
Khiem Duc Pham
Xinzhi Zhang
Sivakanth Gopi
Baolin Peng
Beibin Li
Janardhan Kulkarni
Huseyin A. Inan
ReLMLRM
284
14
0
10 Feb 2025
LIMO: Less is More for Reasoning
LIMO: Less is More for Reasoning
Yixin Ye
Zhen Huang
Yang Xiao
Ethan Chern
Shijie Xia
Pengfei Liu
AIMatReLMLRM
682
170
0
05 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
540
363
0
28 Jan 2025
Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
Iterative Label Refinement Matters More than Preference Optimization under Weak SupervisionInternational Conference on Learning Representations (ICLR), 2025
Yaowen Ye
Cassidy Laidlaw
Jacob Steinhardt
ALM
166
2
0
14 Jan 2025
A Roadmap to Guide the Integration of LLMs in Hierarchical Planning
A Roadmap to Guide the Integration of LLMs in Hierarchical Planning
Israel Puerta-Merino
Carlos Núnez-Molina
Pablo Mesejo
Juan Fernández-Olivares
261
3
0
14 Jan 2025
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Robert Z. Sparks
Charlie Snell
Kanishk Gandhi
Alon Albalak
Anikait Singh
...
Dakota Mahan
Louis Castricato
Jan-Philipp Fränken
Nick Haber
Chelsea Finn
LRM
298
78
0
08 Jan 2025
JudgeBench: A Benchmark for Evaluating LLM-based Judges
JudgeBench: A Benchmark for Evaluating LLM-based JudgesInternational Conference on Learning Representations (ICLR), 2024
Sijun Tan
Siyuan Zhuang
Kyle Montgomery
William Y. Tang
Alejandro Cuadron
Chenguang Wang
Raluca A. Popa
Ion Stoica
ELMALM
526
130
0
16 Oct 2024
Denial-of-Service Poisoning Attacks against Large Language Models
Denial-of-Service Poisoning Attacks against Large Language Models
Kuofeng Gao
Tianyu Pang
Chao Du
Yong Yang
Shu-Tao Xia
Min Lin
SILMAAML
295
115
0
14 Oct 2024
1