Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2504.16129
Cited By
v1
v2
v3
v4 (latest)
MARFT: Multi-Agent Reinforcement Fine-Tuning
21 April 2025
Junwei Liao
Muning Wen
Jun Wang
Weinan Zhang
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (18274★)
Papers citing
"MARFT: Multi-Agent Reinforcement Fine-Tuning"
50 / 69 papers shown
Title
VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning
Boyu Chen
Zikang Wang
Zhengrong Yue
Kainan Yan
Chenyun Yu
...
Yafei Wen
Xiaoxin Chen
Yang Liu
Peng Li
Yali Wang
LLMAG
248
0
0
24 Nov 2025
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
Zhiwei Zhang
Xiaomin Li
Yudi Lin
Hui Liu
Ramraj Chandradevan
...
Minhua Lin
Fali Wang
Xianfeng Tang
Qi He
Suhang Wang
LLMAG
LRM
215
0
0
04 Nov 2025
MASPRM: Multi-Agent System Process Reward Model
Milad Yazdani
Mahdi Mostajabdaveh
Zirui Zhou
Ying Xiong
60
0
0
28 Oct 2025
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism
Xiaoshu Chen
Sihang Zhou
Ke Liang
Duanyang Yuan
Haoyuan Chen
Xiaoyu Sun
Linyuan Meng
Xinwang Liu
ReLM
LRM
189
0
0
15 Oct 2025
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Xiangyuan Xue
Yifan Zhou
G. Zhang
Zaibin Zhang
Y. Li
Chen Zhang
Z. Yin
Philip Torr
Wanli Ouyang
Lei Bai
LLMAG
101
2
0
09 Oct 2025
Interactive Learning for LLM Reasoning
Hehai Lin
Shilei Cao
Minzhi Li
Sudong Wang
Haotian Wu
Linyi Yang
Lixian Zhang
Chengwei Qin
LLMAG
LRM
229
0
0
30 Sep 2025
MAS
2
^2
2
: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
Kun Wang
G. Zhang
ManKit Ye
Xinyu Deng
Dongxia Wang
Xiaobin Hu
Jinyang Guo
Yang Liu
Yufei Guo
LLMAG
106
0
0
29 Sep 2025
ToMPO: Training LLM Strategic Decision Making from a Multi-Agent Perspective
Yiwen Zhang
Ziang Chen
Fanqi Kong
Yizhe Huang
Xue Feng
LLMAG
148
0
0
25 Sep 2025
Learning to Deliberate: Meta-policy Collaboration for Agentic LLMs with Multi-agent Reinforcement Learning
Wei Yang
Jesse Thomason
138
5
0
04 Sep 2025
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
Quanfeng Lu
Zhantao Ma
Shuai Zhong
Jin Wang
Dahai Yu
Michael K. Ng
Ping Luo
164
0
0
27 Aug 2025
Heterogeneous Group-Based Reinforcement Learning for LLM-based Multi-Agent Systems
Guanzhong Chen
Shaoxiong Yang
Chao Li
Wei Liu
Jian Luan
Zenglin Xu
192
4
0
03 Jun 2025
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Yufa Zhou
S. Wang
Xingyu Dong
Xiangqi Jin
Yifang Chen
Yue Min
Kexin Yang
Xingzhang Ren
Dayiheng Liu
Linfeng Zhang
OffRL
LRM
210
1
0
31 May 2025
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
Zijun Liu
Zhennan Wan
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Wenshu Fan
LLMAG
211
0
0
27 May 2025
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri
Melissa Z. Pan
Shuyi Yang
Lakshya A Agrawal
Bhavya Chopra
...
Dan Klein
Kannan Ramchandran
Matei A. Zaharia
Joseph E. Gonzalez
Ion Stoica
LLMAG
551
162
0
17 Mar 2025
Interactive Debugging and Steering of Multi-Agent AI Systems
International Conference on Human Factors in Computing Systems (CHI), 2025
Will Epperson
Gagan Bansal
Victor C. Dibia
Adam Fourney
Jack Gerrits
Erkang Zhu
Saleema Amershi
227
29
0
03 Mar 2025
HARBOR: Exploring Persona Dynamics in Multi-Agent Competition
Kenan Jiang
Li Xiong
Fei Liu
354
3
0
17 Feb 2025
Networked Agents in the Dark: Team Value Learning under Partial Observability
Adaptive Agents and Multi-Agent Systems (AAMAS), 2025
G. Varela
Alberto Sardinha
Francisco S. Melo
147
1
0
15 Jan 2025
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
Zhiyuan Zeng
Qinyuan Cheng
Zhangyue Yin
Bo Wang
Shimin Li
Yunhua Zhou
Qipeng Guo
Qi Zhang
Jiaqi Leng
ELM
AI4TS
LRM
261
47
0
18 Dec 2024
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Jun Wang
Meng Fang
Bo Liu
Muning Wen
Jiachen Zhu
...
Lei Chen
Lionel M. Ni
Linyi Yang
Ying Wen
Weinan Zhang
LRM
190
59
0
12 Oct 2024
Qwen2.5-Coder Technical Report
Binyuan Hui
Jian Yang
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
...
Fei Huang
Xingzhang Ren
Xuancheng Ren
Jingren Zhou
Junyang Lin
OSLM
295
757
0
18 Sep 2024
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Jiarui Lu
Thomas Holleis
Yizhe Zhang
Bernhard Aumayer
Feng Nan
...
Shen Ma
Mengyu Li
Guoli Yin
Zirui Wang
Ruoming Pang
LLMAG
ELM
346
82
0
08 Aug 2024
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future
Haolin Jin
Linghan Huang
Haipeng Cai
Jun Yan
Bo Li
Huaming Chen
338
77
0
05 Aug 2024
Qwen2 Technical Report
An Yang
Baosong Yang
Binyuan Hui
Jian Xu
Bowen Yu
...
Yuqiong Liu
Zeyu Cui
Zhenru Zhang
Zhifang Guo
Zhi-Wei Fan
OSLM
VLM
MU
512
1,617
0
15 Jul 2024
Reinforcing Language Agents via Policy Optimization with Action Decomposition
Muning Wen
Bo Liu
Weinan Zhang
Jun Wang
Ying Wen
184
12
0
23 May 2024
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Jian Hu
Xibin Wu
Wei Shen
OpenLLMAI Team
Dehao Zhang
...
Weikai Fang
Xianyu
Yu Cao
Haotian Xu
Yiming Liu
VLM
AI4CE
365
132
0
20 May 2024
Octopus: On-device language model for function calling of software APIs
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Wei Chen
Zhiyuan Li
Mingyuan Ma
LLMAG
257
22
0
02 Apr 2024
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution
Wei Tao
Yucheng Zhou
Yanlin Wang
Wenqiang Zhang
Hongyu Zhang
Yu Cheng
LLMAG
321
100
0
26 Mar 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
International Conference on Learning Representations (ICLR), 2024
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
389
879
0
12 Mar 2024
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement
Muning Wen
Junwei Liao
Cheng Deng
Jun Wang
Weinan Zhang
Ying Wen
234
6
0
09 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
1.1K
3,497
0
05 Feb 2024
GAIA: a benchmark for General AI Assistants
Grégoire Mialon
Clémentine Fourrier
Craig Swift
Thomas Wolf
Yann LeCun
Thomas Scialom
AI4MH
ALM
ELM
RALM
354
400
0
21 Nov 2023
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Omar Khattab
Arnav Singhvi
Paridhi Maheshwari
Zhiyuan Zhang
Keshav Santhanam
...
Thomas T. Joshi
Hanna Moazam
Heather Miller
Matei A. Zaharia
Christopher Potts
RALM
345
460
0
05 Oct 2023
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
International Conference on Machine Learning (ICML), 2023
Chrisantha Fernando
Dylan Banarse
Henryk Michalewski
Simon Osindero
Tim Rocktaschel
LLMAG
ReLM
LRM
262
316
0
28 Sep 2023
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Qingyun Wu
Gagan Bansal
Jieyu Zhang
Yiran Wu
Beibin Li
...
Jiale Liu
Ahmed Hassan Awadallah
Ryen W. White
Doug Burger
Chi Wang
LLMAG
AI4CE
283
840
0
16 Aug 2023
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
International Conference on Learning Representations (ICLR), 2023
Sirui Hong
Mingchen Zhuge
Jonathan Chen
Xiawu Zheng
Yuheng Cheng
...
Liyang Zhou
Chenyu Ran
Lingfeng Xiao
Chenglin Wu
Jürgen Schmidhuber
LLMAG
AIFin
401
548
0
01 Aug 2023
ChatDev: Communicative Agents for Software Development
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Cheng Qian
Wei Liu
Hongzhang Liu
Nuo Chen
Yufan Dang
...
Xin Cong
Juyuan Xu
Dahai Li
Zhiyuan Liu
Maosong Sun
LLMAG
338
448
0
16 Jul 2023
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
IEEE International Conference on Robotics and Automation (ICRA), 2023
Zhao Mandi
Shreeya Jain
Shuran Song
LM&Ro
LLMAG
182
204
0
10 Jul 2023
AD-AutoGPT: An Autonomous GPT for Alzheimer's Disease Infodemiology
Haixing Dai
Yiwei Li
Zheng Liu
Lin Zhao
Zihao Wu
...
Shijie Zhao
Zhuo Chen
D. Zhang
Gengchen Mai
Tianming Liu
LM&MA
192
36
0
16 Jun 2023
Gorilla: Large Language Model Connected with Massive APIs
Neural Information Processing Systems (NeurIPS), 2023
Shishir G. Patil
Tianjun Zhang
Xin Wang
Joseph E. Gonzalez
ELM
CLL
ALM
SyDa
340
821
0
24 May 2023
An Empirical Study on Google Research Football Multi-agent Scenarios
Machine Intelligence Research (MIR), 2023
Yan Song
He Jiang
Zheng Tian
Haifeng Zhang
Yingping Zhang
Jiangcheng Zhu
Zonghong Dai
Weinan Zhang
Jun Wang
184
9
0
16 May 2023
Order Matters: Agent-by-agent Policy Optimization
International Conference on Learning Representations (ICLR), 2023
Xihuai Wang
Zheng Tian
Bo Liu
Ying Wen
Jun Wang
Weinan Zhang
249
42
0
13 Feb 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
Neural Information Processing Systems (NeurIPS), 2023
Timo Schick
Jane Dwivedi-Yu
Roberto Dessì
Roberta Raileanu
Maria Lomeli
Luke Zettlemoyer
Nicola Cancedda
Thomas Scialom
SyDa
RALM
373
2,515
0
09 Feb 2023
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
International Conference on Machine Learning (ICML), 2023
Thomas Carta
Clément Romac
Thomas Wolf
Sylvain Lamprier
Olivier Sigaud
Pierre-Yves Oudeyer
LM&Ro
LLMAG
316
232
0
06 Feb 2023
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Neural Information Processing Systems (NeurIPS), 2022
Shunyu Yao
Howard Chen
John Yang
Karthik Narasimhan
LLMAG
LM&Ro
675
726
0
04 Jul 2022
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Neural Information Processing Systems (NeurIPS), 2022
Muning Wen
J. Kuba
Runji Lin
Weinan Zhang
Ying Wen
Jun Wang
Yaodong Yang
270
264
0
30 May 2022
Training language models to follow instructions with human feedback
Neural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
1.9K
16,931
0
04 Mar 2022
Communication-Efficient Actor-Critic Methods for Homogeneous Markov Games
International Conference on Learning Representations (ICLR), 2022
Dingyang Chen
Yile Li
Qi Zhang
OffRL
340
11
0
18 Feb 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
996
6,547
0
27 Oct 2021
Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning
International Conference on Learning Representations (ICLR), 2021
J. Kuba
Ruiqing Chen
Munning Wen
Ying Wen
Fanglei Sun
Jun Wang
Yaodong Yang
311
316
0
23 Sep 2021
Settling the Variance of Multi-Agent Policy Gradients
J. Kuba
Muning Wen
Yaodong Yang
Linghui Meng
Shangding Gu
Haifeng Zhang
D. Mguni
Jun Wang
215
90
0
19 Aug 2021
1
2
Next