ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.16129
  4. Cited By
MARFT: Multi-Agent Reinforcement Fine-Tuning
v1v2v3v4 (latest)

MARFT: Multi-Agent Reinforcement Fine-Tuning

21 April 2025
Junwei Liao
Muning Wen
Jun Wang
Weinan Zhang
    OffRL
ArXiv (abs)PDFHTMLGithub (18274★)

Papers citing "MARFT: Multi-Agent Reinforcement Fine-Tuning"

50 / 69 papers shown
Title
VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning
VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning
Boyu Chen
Zikang Wang
Zhengrong Yue
Kainan Yan
Chenyun Yu
...
Yafei Wen
Xiaoxin Chen
Yang Liu
Peng Li
Yali Wang
LLMAG
248
0
0
24 Nov 2025
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
Zhiwei Zhang
Xiaomin Li
Yudi Lin
Hui Liu
Ramraj Chandradevan
...
Minhua Lin
Fali Wang
Xianfeng Tang
Qi He
Suhang Wang
LLMAGLRM
215
0
0
04 Nov 2025
MASPRM: Multi-Agent System Process Reward Model
MASPRM: Multi-Agent System Process Reward Model
Milad Yazdani
Mahdi Mostajabdaveh
Zirui Zhou
Ying Xiong
60
0
0
28 Oct 2025
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism
Xiaoshu Chen
Sihang Zhou
Ke Liang
Duanyang Yuan
Haoyuan Chen
Xiaoyu Sun
Linyuan Meng
Xinwang Liu
ReLMLRM
189
0
0
15 Oct 2025
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Xiangyuan Xue
Yifan Zhou
G. Zhang
Zaibin Zhang
Y. Li
Chen Zhang
Z. Yin
Philip Torr
Wanli Ouyang
Lei Bai
LLMAG
101
2
0
09 Oct 2025
Interactive Learning for LLM Reasoning
Interactive Learning for LLM Reasoning
Hehai Lin
Shilei Cao
Minzhi Li
Sudong Wang
Haotian Wu
Linyi Yang
Lixian Zhang
Chengwei Qin
LLMAGLRM
229
0
0
30 Sep 2025
MAS$^2$: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
MAS2^22: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
Kun Wang
G. Zhang
ManKit Ye
Xinyu Deng
Dongxia Wang
Xiaobin Hu
Jinyang Guo
Yang Liu
Yufei Guo
LLMAG
106
0
0
29 Sep 2025
ToMPO: Training LLM Strategic Decision Making from a Multi-Agent Perspective
ToMPO: Training LLM Strategic Decision Making from a Multi-Agent Perspective
Yiwen Zhang
Ziang Chen
Fanqi Kong
Yizhe Huang
Xue Feng
LLMAG
148
0
0
25 Sep 2025
Learning to Deliberate: Meta-policy Collaboration for Agentic LLMs with Multi-agent Reinforcement Learning
Learning to Deliberate: Meta-policy Collaboration for Agentic LLMs with Multi-agent Reinforcement Learning
Wei Yang
Jesse Thomason
138
5
0
04 Sep 2025
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
Quanfeng Lu
Zhantao Ma
Shuai Zhong
Jin Wang
Dahai Yu
Michael K. Ng
Ping Luo
164
0
0
27 Aug 2025
Heterogeneous Group-Based Reinforcement Learning for LLM-based Multi-Agent Systems
Heterogeneous Group-Based Reinforcement Learning for LLM-based Multi-Agent Systems
Guanzhong Chen
Shaoxiong Yang
Chao Li
Wei Liu
Jian Luan
Zenglin Xu
192
4
0
03 Jun 2025
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Yufa Zhou
S. Wang
Xingyu Dong
Xiangqi Jin
Yifang Chen
Yue Min
Kexin Yang
Xingzhang Ren
Dayiheng Liu
Linfeng Zhang
OffRLLRM
210
1
0
31 May 2025
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
Zijun Liu
Zhennan Wan
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Wenshu Fan
LLMAG
211
0
0
27 May 2025
Why Do Multi-Agent LLM Systems Fail?
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri
Melissa Z. Pan
Shuyi Yang
Lakshya A Agrawal
Bhavya Chopra
...
Dan Klein
Kannan Ramchandran
Matei A. Zaharia
Joseph E. Gonzalez
Ion Stoica
LLMAG
551
162
0
17 Mar 2025
Interactive Debugging and Steering of Multi-Agent AI SystemsInternational Conference on Human Factors in Computing Systems (CHI), 2025
Will Epperson
Gagan Bansal
Victor C. Dibia
Adam Fourney
Jack Gerrits
Erkang Zhu
Saleema Amershi
227
29
0
03 Mar 2025
HARBOR: Exploring Persona Dynamics in Multi-Agent Competition
HARBOR: Exploring Persona Dynamics in Multi-Agent Competition
Kenan Jiang
Li Xiong
Fei Liu
354
3
0
17 Feb 2025
Networked Agents in the Dark: Team Value Learning under Partial Observability
Networked Agents in the Dark: Team Value Learning under Partial ObservabilityAdaptive Agents and Multi-Agent Systems (AAMAS), 2025
G. Varela
Alberto Sardinha
Francisco S. Melo
147
1
0
15 Jan 2025
Scaling of Search and Learning: A Roadmap to Reproduce o1 from
  Reinforcement Learning Perspective
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
Zhiyuan Zeng
Qinyuan Cheng
Zhangyue Yin
Bo Wang
Shimin Li
Yunhua Zhou
Qipeng Guo
Qi Zhang
Jiaqi Leng
ELMAI4TSLRM
261
47
0
18 Dec 2024
OpenR: An Open Source Framework for Advanced Reasoning with Large
  Language Models
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Jun Wang
Meng Fang
Bo Liu
Muning Wen
Jiachen Zhu
...
Lei Chen
Lionel M. Ni
Linyi Yang
Ying Wen
Weinan Zhang
LRM
190
59
0
12 Oct 2024
Qwen2.5-Coder Technical Report
Qwen2.5-Coder Technical Report
Binyuan Hui
Jian Yang
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
...
Fei Huang
Xingzhang Ren
Xuancheng Ren
Jingren Zhou
Junyang Lin
OSLM
295
757
0
18 Sep 2024
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use CapabilitiesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Jiarui Lu
Thomas Holleis
Yizhe Zhang
Bernhard Aumayer
Feng Nan
...
Shen Ma
Mengyu Li
Guoli Yin
Zirui Wang
Ruoming Pang
LLMAGELM
346
82
0
08 Aug 2024
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future
Haolin Jin
Linghan Huang
Haipeng Cai
Jun Yan
Bo Li
Huaming Chen
338
77
0
05 Aug 2024
Qwen2 Technical Report
Qwen2 Technical Report
An Yang
Baosong Yang
Binyuan Hui
Jian Xu
Bowen Yu
...
Yuqiong Liu
Zeyu Cui
Zhenru Zhang
Zhifang Guo
Zhi-Wei Fan
OSLMVLMMU
512
1,617
0
15 Jul 2024
Reinforcing Language Agents via Policy Optimization with Action
  Decomposition
Reinforcing Language Agents via Policy Optimization with Action Decomposition
Muning Wen
Bo Liu
Weinan Zhang
Jun Wang
Ying Wen
184
12
0
23 May 2024
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Jian Hu
Xibin Wu
Wei Shen
OpenLLMAI Team
Dehao Zhang
...
Weikai Fang
Xianyu
Yu Cao
Haotian Xu
Yiming Liu
VLMAI4CE
365
132
0
20 May 2024
Octopus: On-device language model for function calling of software APIs
Octopus: On-device language model for function calling of software APIsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Wei Chen
Zhiyuan Li
Mingyuan Ma
LLMAG
257
22
0
02 Apr 2024
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution
Wei Tao
Yucheng Zhou
Yanlin Wang
Wenqiang Zhang
Hongyu Zhang
Yu Cheng
LLMAG
321
100
0
26 Mar 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for CodeInternational Conference on Learning Representations (ICLR), 2024
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
389
879
0
12 Mar 2024
Entropy-Regularized Token-Level Policy Optimization for Language Agent
  Reinforcement
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement
Muning Wen
Junwei Liao
Cheng Deng
Jun Wang
Weinan Zhang
Ying Wen
234
6
0
09 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLMLRM
1.1K
3,497
0
05 Feb 2024
GAIA: a benchmark for General AI Assistants
GAIA: a benchmark for General AI Assistants
Grégoire Mialon
Clémentine Fourrier
Craig Swift
Thomas Wolf
Yann LeCun
Thomas Scialom
AI4MHALMELMRALM
354
400
0
21 Nov 2023
DSPy: Compiling Declarative Language Model Calls into Self-Improving
  Pipelines
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Omar Khattab
Arnav Singhvi
Paridhi Maheshwari
Zhiyuan Zhang
Keshav Santhanam
...
Thomas T. Joshi
Hanna Moazam
Heather Miller
Matei A. Zaharia
Christopher Potts
RALM
345
460
0
05 Oct 2023
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Promptbreeder: Self-Referential Self-Improvement Via Prompt EvolutionInternational Conference on Machine Learning (ICML), 2023
Chrisantha Fernando
Dylan Banarse
Henryk Michalewski
Simon Osindero
Tim Rocktaschel
LLMAGReLMLRM
262
316
0
28 Sep 2023
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Qingyun Wu
Gagan Bansal
Jieyu Zhang
Yiran Wu
Beibin Li
...
Jiale Liu
Ahmed Hassan Awadallah
Ryen W. White
Doug Burger
Chi Wang
LLMAGAI4CE
283
840
0
16 Aug 2023
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
MetaGPT: Meta Programming for A Multi-Agent Collaborative FrameworkInternational Conference on Learning Representations (ICLR), 2023
Sirui Hong
Mingchen Zhuge
Jonathan Chen
Xiawu Zheng
Yuheng Cheng
...
Liyang Zhou
Chenyu Ran
Lingfeng Xiao
Chenglin Wu
Jürgen Schmidhuber
LLMAGAIFin
401
548
0
01 Aug 2023
ChatDev: Communicative Agents for Software Development
ChatDev: Communicative Agents for Software DevelopmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Cheng Qian
Wei Liu
Hongzhang Liu
Nuo Chen
Yufan Dang
...
Xin Cong
Juyuan Xu
Dahai Li
Zhiyuan Liu
Maosong Sun
LLMAG
338
448
0
16 Jul 2023
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
RoCo: Dialectic Multi-Robot Collaboration with Large Language ModelsIEEE International Conference on Robotics and Automation (ICRA), 2023
Zhao Mandi
Shreeya Jain
Shuran Song
LM&RoLLMAG
182
204
0
10 Jul 2023
AD-AutoGPT: An Autonomous GPT for Alzheimer's Disease Infodemiology
AD-AutoGPT: An Autonomous GPT for Alzheimer's Disease Infodemiology
Haixing Dai
Yiwei Li
Zheng Liu
Lin Zhao
Zihao Wu
...
Shijie Zhao
Zhuo Chen
D. Zhang
Gengchen Mai
Tianming Liu
LM&MA
192
36
0
16 Jun 2023
Gorilla: Large Language Model Connected with Massive APIs
Gorilla: Large Language Model Connected with Massive APIsNeural Information Processing Systems (NeurIPS), 2023
Shishir G. Patil
Tianjun Zhang
Xin Wang
Joseph E. Gonzalez
ELMCLLALMSyDa
340
821
0
24 May 2023
An Empirical Study on Google Research Football Multi-agent Scenarios
An Empirical Study on Google Research Football Multi-agent ScenariosMachine Intelligence Research (MIR), 2023
Yan Song
He Jiang
Zheng Tian
Haifeng Zhang
Yingping Zhang
Jiangcheng Zhu
Zonghong Dai
Weinan Zhang
Jun Wang
184
9
0
16 May 2023
Order Matters: Agent-by-agent Policy Optimization
Order Matters: Agent-by-agent Policy OptimizationInternational Conference on Learning Representations (ICLR), 2023
Xihuai Wang
Zheng Tian
Bo Liu
Ying Wen
Jun Wang
Weinan Zhang
249
42
0
13 Feb 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
Toolformer: Language Models Can Teach Themselves to Use ToolsNeural Information Processing Systems (NeurIPS), 2023
Timo Schick
Jane Dwivedi-Yu
Roberto Dessì
Roberta Raileanu
Maria Lomeli
Luke Zettlemoyer
Nicola Cancedda
Thomas Scialom
SyDaRALM
373
2,515
0
09 Feb 2023
Grounding Large Language Models in Interactive Environments with Online
  Reinforcement Learning
Grounding Large Language Models in Interactive Environments with Online Reinforcement LearningInternational Conference on Machine Learning (ICML), 2023
Thomas Carta
Clément Romac
Thomas Wolf
Sylvain Lamprier
Olivier Sigaud
Pierre-Yves Oudeyer
LM&RoLLMAG
316
232
0
06 Feb 2023
WebShop: Towards Scalable Real-World Web Interaction with Grounded
  Language Agents
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language AgentsNeural Information Processing Systems (NeurIPS), 2022
Shunyu Yao
Howard Chen
John Yang
Karthik Narasimhan
LLMAGLM&Ro
675
726
0
04 Jul 2022
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Multi-Agent Reinforcement Learning is a Sequence Modeling ProblemNeural Information Processing Systems (NeurIPS), 2022
Muning Wen
J. Kuba
Runji Lin
Weinan Zhang
Ying Wen
Jun Wang
Yaodong Yang
270
264
0
30 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
1.9K
16,931
0
04 Mar 2022
Communication-Efficient Actor-Critic Methods for Homogeneous Markov
  Games
Communication-Efficient Actor-Critic Methods for Homogeneous Markov GamesInternational Conference on Learning Representations (ICLR), 2022
Dingyang Chen
Yile Li
Qi Zhang
OffRL
340
11
0
18 Feb 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
996
6,547
0
27 Oct 2021
Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning
Trust Region Policy Optimisation in Multi-Agent Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2021
J. Kuba
Ruiqing Chen
Munning Wen
Ying Wen
Fanglei Sun
Jun Wang
Yaodong Yang
311
316
0
23 Sep 2021
Settling the Variance of Multi-Agent Policy Gradients
Settling the Variance of Multi-Agent Policy Gradients
J. Kuba
Muning Wen
Yaodong Yang
Linghui Meng
Shangding Gu
Haifeng Zhang
D. Mguni
Jun Wang
215
90
0
19 Aug 2021
12
Next