ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.01935
58
4

MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents

3 March 2025
Kunlun Zhu
Hongyi Du
Zhaochen Hong
Xiaocheng Yang
Shuyi Guo
Zhe Wang
Zhenhailong Wang
Cheng Qian
Xiangru Tang
Heng Ji
Jiaxuan You
    LLMAG
ArXivPDFHTML
Abstract

Large Language Models (LLMs) have shown remarkable capabilities as autonomous agents, yet existing benchmarks either focus on single-agent tasks or are confined to narrow domains, failing to capture the dynamics of multi-agent coordination and competition. In this paper, we introduce MultiAgentBench, a comprehensive benchmark designed to evaluate LLM-based multi-agent systems across diverse, interactive scenarios. Our framework measures not only task completion but also the quality of collaboration and competition using novel, milestone-based key performance indicators. Moreover, we evaluate various coordination protocols (including star, chain, tree, and graph topologies) and innovative strategies such as group discussion and cognitive planning. Notably, gpt-4o-mini reaches the average highest task score, graph structure performs the best among coordination protocols in the research scenario, and cognitive planning improves milestone achievement rates by 3%. Code and datasets are public available atthis https URL.

View on arXiv
@article{zhu2025_2503.01935,
  title={ MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents },
  author={ Kunlun Zhu and Hongyi Du and Zhaochen Hong and Xiaocheng Yang and Shuyi Guo and Zhe Wang and Zhenhailong Wang and Cheng Qian and Xiangru Tang and Heng Ji and Jiaxuan You },
  journal={arXiv preprint arXiv:2503.01935},
  year={ 2025 }
}
Comments on this paper