ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16700
  4. Cited By
MCP-RADAR: A Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models
v1v2 (latest)

MCP-RADAR: A Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models

22 May 2025
Xuanqi Gao
Siyi Xie
Juan Zhai
Shqing Ma
Chao Shen
    ELM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (11943★)

Papers citing "MCP-RADAR: A Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models"

46 / 46 papers shown
Benchmark for Planning and Control with Large Language Model Agents: Blocksworld with Model Context Protocol
Benchmark for Planning and Control with Large Language Model Agents: Blocksworld with Model Context Protocol
Niklas Jobs
Luis Miguel Vieira da Silva
Jayanth Somashekaraiah
Maximilian Weigand
David Kube
Felix Gehlhoff
LLMAGLM&Ro
192
0
0
03 Dec 2025
A Longitudinal Measurement of Privacy Policy Evolution for Large Language Models
A Longitudinal Measurement of Privacy Policy Evolution for Large Language Models
Zhen Tao
Shidong Pan
Zhenchang Xing
Emily Black
Talia B. Gillis
Chunyang Chen
AILaw
93
0
0
24 Nov 2025
MCP-RiskCue: Can LLM Infer Risk Information From MCP Server System Logs?
MCP-RiskCue: Can LLM Infer Risk Information From MCP Server System Logs?
Jiayi Fu
Qiyao Sun
136
0
0
08 Nov 2025
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents
Hongrui Jia
Jitong Liao
X. Zhang
Haiyang Xu
Tianbao Xie
Chaoya Jiang
Ming Yan
Si Liu
Wei Ye
Fei Huang
173
4
0
28 Oct 2025
MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
Wenhao Wang
Peizhi Niu
Zhao Xu
Zhaoyu Chen
Jian Du
...
Xianghe Pang
Keduan Huang
Y. Wang
Qiang Yan
Siheng Chen
170
0
0
28 Oct 2025
MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration
MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration
Jia-Kai Dong
I-Wei Huang
Chun-Tin Wu
Yi-Tien Tsai
151
0
0
22 Oct 2025
TheMCPCompany: Creating General-purpose Agents with Task-specific Tools
TheMCPCompany: Creating General-purpose Agents with Task-specific Tools
Reza Esfandiarpoor
Vishwas Suryanarayanan
Stephen H. Bach
Vishal Chowdhary
Anthony Aue
LLMAG
207
0
0
22 Oct 2025
InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents
InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents
Yaxin Du
Y. Zhang
Xiyuan Yang
Yifan Zhou
Cheng-Yu Wang
...
Menglan Chen
Shuo Tang
Z. Li
Feiyu Xiong
Siheng Chen
172
0
0
02 Oct 2025
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
Zhangchen Xu
Adriana Meza Soria
Shawn Tan
Anurag Roy
Ashish Sunil Agrawal
Radha Poovendran
Rameswar Panda
116
9
0
01 Oct 2025
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
Zijian Wu
Xiangyan Liu
Xinyuan Zhang
L. Chen
Fanqing Meng
...
Zirui Wang
Jinjie Ni
Y. Yang
Arvin Xu
Michael Shieh
116
4
0
28 Sep 2025
IoT-MCP: Bridging LLMs and IoT Systems Through Model Context Protocol
IoT-MCP: Bridging LLMs and IoT Systems Through Model Context Protocol
Ningyuan Yang
Guanliang Lyu
Mingchen Ma
Yiyi Lu
Yiming Li
Zhihui Gao
Hancheng Ye
Jianyi Zhang
Tingjun Chen
Yiran Chen
102
0
0
25 Sep 2025
ARE: Scaling Up Agent Environments and Evaluations
ARE: Scaling Up Agent Environments and Evaluations
Pierre Andrews
Amine Benhalloum
Gerard Moreno-Torres Bertran
Matteo Bettini
Amar Budhiraja
...
Andrey Rusakov
Thomas Scialom
Vladislav Vorotilov
Mengjue Wang
Ian Yu
LLMAG
385
7
0
21 Sep 2025
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools
Zikang Guo
Benfeng Xu
Chiwei Zhu
Wentao Hong
Xiaorui Wang
Zhendong Mao
ELM
150
8
0
10 Sep 2025
Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills
Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills
David Noever
137
0
0
27 Aug 2025
MCPVerse: An Expansive, Real-World Benchmark for Agentic Tool Use
MCPVerse: An Expansive, Real-World Benchmark for Agentic Tool Use
Fei Lei
Yibo Yang
Wenxiu Sun
Dahua Lin
LLMAG
168
3
0
22 Aug 2025
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
Ming Yin
Dinghan Shen
Silei Xu
Jianbing Han
Sixun Dong
...
Song Wang
Sathish Indurthi
Xun Wang
Yiran Chen
Kaiqiang Song
LLMAG
182
10
0
21 Aug 2025
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers
Ziyang Luo
Zhiqi Shen
Wenzhuo Yang
Zirui Zhao
Prathyusha Jwalapuram
Amrita Saha
Doyen Sahoo
Silvio Savarese
Caiming Xiong
Junnan Li
ELM
208
25
0
20 Aug 2025
Agentic DraCor and the Art of Docstring Engineering: Evaluating MCP-empowered LLM Usage of the DraCor API
Agentic DraCor and the Art of Docstring Engineering: Evaluating MCP-empowered LLM Usage of the DraCor API
Peer Trilcke
Ingo Börner
Henny Sluyter-Gäthje
Daniil Skorinkin
Frank Fischer
Carsten Milling
65
0
0
19 Aug 2025
MCPToolBench++: A Large Scale AI Agent Model Context Protocol MCP Tool Use Benchmark
MCPToolBench++: A Large Scale AI Agent Model Context Protocol MCP Tool Use Benchmark
Shiqing Fan
Xichen Ding
Liang Zhang
Linjian Mo
LLMAG
86
10
0
11 Aug 2025
Routine: A Structural Planning Framework for LLM Agent System in Enterprise
Routine: A Structural Planning Framework for LLM Agent System in Enterprise
Guancheng Zeng
Xueyi Chen
Jiawang Hu
Shaohua Qi
Yaxuan Mao
...
Wenqiang Han
Linyan Huang
Gang Li
Jingjing Mo
Haowen Hu
195
2
0
19 Jul 2025
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language
  Models
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Pei Wang
Yanan Wu
Zekun Wang
Qingbin Liu
Xiaoshuai Song
...
Ge Zhang
Hangyu Guo
Rundong Wang
Yuchi Xu
Bo Zheng
ELM
257
9
0
15 Oct 2024
AutoFeedback: An LLM-based Framework for Efficient and Accurate API
  Request Generation
AutoFeedback: An LLM-based Framework for Efficient and Accurate API Request Generation
Huanxi Liu
Jiaqi Liao
Dawei Feng
Kele Xu
Huaimin Wang
878
5
0
09 Oct 2024
GAIA: a benchmark for General AI Assistants
GAIA: a benchmark for General AI Assistants
Grégoire Mialon
Clémentine Fourrier
Craig Swift
Thomas Wolf
Yann LeCun
Thomas Scialom
AI4MHALMELMRALM
439
442
0
21 Nov 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
  Feedback
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language FeedbackInternational Conference on Learning Representations (ICLR), 2023
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
471
253
0
19 Sep 2023
AgentBench: Evaluating LLMs as Agents
AgentBench: Evaluating LLMs as AgentsInternational Conference on Learning Representations (ICLR), 2023
Xiao-Yang Liu
Hao Yu
Hanchen Zhang
Yifan Xu
Xuanyu Lei
...
Yu-Chuan Su
Huan Sun
Shiyu Huang
Yuxiao Dong
Jie Tang
ELMLLMAG
532
494
0
07 Aug 2023
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language
  Models
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Cheng-Yu Hsieh
Sibei Chen
Chun-Liang Li
Yasuhisa Fujii
Alexander Ratner
Chen-Yu Lee
Ranjay Krishna
Tomas Pfister
LLMAGSyDa
288
58
0
01 Aug 2023
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world
  APIs
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIsInternational Conference on Learning Representations (ICLR), 2023
Yujia Qin
Shi Liang
Yining Ye
Kunlun Zhu
Lan Yan
...
Jie Zhou
Mark B. Gerstein
Dahai Li
Zhiyuan Liu
Maosong Sun
CLLALMLLMAGELMLM&MA
593
1,109
0
31 Jul 2023
ToolQA: A Dataset for LLM Question Answering with External Tools
ToolQA: A Dataset for LLM Question Answering with External ToolsNeural Information Processing Systems (NeurIPS), 2023
Yuchen Zhuang
Yue Yu
Kuan-Chieh Wang
Haotian Sun
Chao Zhang
ELMLLMAG
325
342
0
23 Jun 2023
RestGPT: Connecting Large Language Models with Real-World RESTful APIs
RestGPT: Connecting Large Language Models with Real-World RESTful APIs
Yifan Song
Weimin Xiong
Dawei Zhu
Wenhao Wu
Han Qian
...
Cheng Li
Ke Wang
Rong Yao
Ye Tian
Sujian Li
RALMLLMAGCLLLM&MA
304
113
0
11 Jun 2023
ToolAlpaca: Generalized Tool Learning for Language Models with 3000
  Simulated Cases
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases
Qiaoyu Tang
Ziliang Deng
Hongyu Lin
Xianpei Han
Qiao Liang
Boxi Cao
Le Sun
CLLSyDa
285
282
0
08 Jun 2023
On the Tool Manipulation Capability of Open-source Large Language Models
On the Tool Manipulation Capability of Open-source Large Language Models
Qiantong Xu
Fenglu Hong
Yangqiu Song
Changran Hu
Zheng Chen
Jian Zhang
LLMAG
256
97
0
25 May 2023
Gorilla: Large Language Model Connected with Massive APIs
Gorilla: Large Language Model Connected with Massive APIsNeural Information Processing Systems (NeurIPS), 2023
Shishir G. Patil
Tianjun Zhang
Xin Wang
Joseph E. Gonzalez
ELMCLLALMSyDa
387
871
0
24 May 2023
Interactive Natural Language Processing
Interactive Natural Language Processing
Zekun Wang
Ge Zhang
Kexin Yang
Ning Shi
Wangchunshu Zhou
...
Wenhu Chen
Ke Xu
Dayiheng Liu
Yi-Ting Guo
Jie Fu
KELM
142
45
0
22 May 2023
Chameleon: Plug-and-Play Compositional Reasoning with Large Language
  Models
Chameleon: Plug-and-Play Compositional Reasoning with Large Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Pan Lu
Baolin Peng
Hao Cheng
Michel Galley
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Jianfeng Gao
KELMMLLMLRM
380
412
0
19 Apr 2023
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Minghao Li
Yingxiu Zhao
Yu Bowen
Feifan Song
Hangyu Li
Haiyang Yu
Zhoujun Li
Fei Huang
Yongbin Li
ELMRALMCLL
303
296
0
14 Apr 2023
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Wanjun Zhong
Ruixiang Cui
Yiduo Guo
Yaobo Liang
Shuai Lu
Yanlin Wang
Amin Saied
Weizhu Chen
Nan Duan
ALMELM
378
721
0
13 Apr 2023
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with
  Millions of APIs
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIsIntelligent Computing (IC), 2023
Yaobo Liang
Chenfei Wu
Ting Song
Wenshan Wu
Yan Xia
...
Shaoguang Mao
Yuntao Wang
Linjun Shou
Ming Gong
Nan Duan
LLMAGCLL
264
239
0
29 Mar 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
Toolformer: Language Models Can Teach Themselves to Use ToolsNeural Information Processing Systems (NeurIPS), 2023
Timo Schick
Jane Dwivedi-Yu
Roberto Dessì
Roberta Raileanu
Maria Lomeli
Luke Zettlemoyer
Nicola Cancedda
Thomas Scialom
SyDaRALM
414
2,656
0
09 Feb 2023
TALM: Tool Augmented Language Models
TALM: Tool Augmented Language Models
Aaron T Parisi
Yao-Min Zhao
Noah Fiedel
KELMRALMLLMAG
278
182
0
24 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
2.3K
14,608
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
1.1K
6,810
0
27 Oct 2021
Program Synthesis with Large Language Models
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELMAIMatReCodALM
419
2,869
0
16 Aug 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLMFaML
904
3,932
0
05 Mar 2021
ALFWorld: Aligning Text and Embodied Environments for Interactive
  Learning
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Mohit Shridhar
Xingdi Yuan
Marc-Alexandre Côté
Yonatan Bisk
Adam Trischler
Matthew J. Hausknecht
LM&RoLLMAG
415
635
0
08 Oct 2020
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language UnderstandingInternational Conference on Learning Representations (ICLR), 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
2.3K
6,566
0
07 Sep 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural
  Language Inference
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language InferenceConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Timo Schick
Hinrich Schütze
1.1K
1,754
0
21 Jan 2020
1