Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.04682
Cited By
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
8 August 2024
Jiarui Lu
Thomas Holleis
Yizhe Zhang
Bernhard Aumayer
Feng Nan
Felix Bai
Shuang Ma
Shen Ma
Mengyu Li
Guoli Yin
Zirui Wang
Ruoming Pang
LLMAG
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities"
21 / 21 papers shown
Title
From Glue-Code to Protocols: A Critical Analysis of A2A and MCP Integration for Scalable Agent Systems
Qiaomu Li
Ying Xie
24
0
0
06 May 2025
When2Call: When (not) to Call Tools
Hayley Ross
Ameya Sunil Mahabaleshwarkar
Yoshi Suhara
92
0
0
26 Apr 2025
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao
Muning Wen
J. Wang
W. Zhang
OffRL
23
0
0
21 Apr 2025
FamilyTool: A Multi-hop Personalized Tool Use Benchmark
Yuxin Wang
Yiran Guo
Y. Zheng
Zhangyue Yin
S. Chen
Jie Yang
Jiajun Chen
Xuanjing Huang
Xipeng Qiu
24
0
0
09 Apr 2025
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
Akshara Prabhakar
Z. Liu
Weiran Yao
Jianguo Zhang
Ming Zhu
...
Juan Carlos Niebles
Shelby Heinecke
H. Wang
S.
Caiming Xiong
VGen
77
1
0
04 Apr 2025
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Peijie Yu
Yifan Yang
J. Li
Zelong Zhang
Haorui Wang
Xiao Feng
Feng Zhang
LLMAG
97
0
0
03 Apr 2025
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models
S. Jung
Donghun Lee
Shinbok Lee
Gaeun Seo
Daniel Lee
Byeongil Ko
Junrae Cho
Kihyun Kim
EungGyun Kim
M. Shin
36
0
0
02 Apr 2025
On the Robustness of Agentic Function Calling
Ella Rabinovich
Ateret Anaby-Tavor
LLMAG
50
0
0
01 Apr 2025
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions
Xinyi Hou
Yanjie Zhao
Shenao Wang
Haoyu Wang
53
10
0
30 Mar 2025
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
Presented at
ResearchTrend Connect | LLMAG
on
07 May 2025
93
5
0
20 Mar 2025
Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation
Fan Yin
Zifeng Wang
I-Hung Hsu
Jun Yan
Ke Jiang
...
L. Le
Kai-Wei Chang
Chen-Yu Lee
Hamid Palangi
Tomas Pfister
52
2
0
10 Mar 2025
HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios
Jun Wang
Jiamu Zhou
Muning Wen
Xiaoyun Mo
H. Zhang
...
Cheng Jin
Xihuai Wang
Weinan Zhang
Qiuying Peng
J. Wang
LLMAG
87
0
0
21 Dec 2024
Towards Effective GenAI Multi-Agent Collaboration: Design and Evaluation for Enterprise Applications
Raphael Shu
Nilaksh Das
Michelle Yuan
Monica Sunkara
Yi Zhang
LLMAG
69
2
0
06 Dec 2024
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu
J. Wang
Yuan Meng
Yanning Zhang
Le Sun
Zhi Wang
96
0
0
25 Nov 2024
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
Kung-Hsiang Huang
Akshara Prabhakar
Sidharth Dhawan
Yixin Mao
Huan Wang
Silvio Savarese
Caiming Xiong
Philippe Laban
C. Wu
34
7
0
04 Nov 2024
Library Learning Doesn't: The Curious Case of the Single-Use "Library"
Ian Berlot-Attwell
Frank Rudzicz
Xujie Si
37
1
0
26 Oct 2024
Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases
Elias Lumer
Vamse Kumar Subbiah
James A. Burke
Pradeep Honaganahalli Basavaraju
Austin Huber
31
0
0
18 Oct 2024
Learning Evolving Tools for Large Language Models
Guoxin Chen
Zhong Zhang
Xin Cong
Fangda Guo
Yesai Wu
Yankai Lin
Wenzheng Feng
Yasheng Wang
KELM
52
1
0
09 Oct 2024
SEAL: Suite for Evaluating API-use of LLMs
Woojeong Kim
Ashish Jagmohan
Aditya Vempaty
ELM
ALM
LLMAG
30
0
0
23 Sep 2024
Tool Learning with Large Language Models: A Survey
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Jirong Wen
LLMAG
31
77
0
28 May 2024
Tur[k]ingBench: A Challenge Benchmark for Web Agents
Kevin Xu
Yeganeh Kordi
Kate Sanders
Yizhong Wang
Adam Byerly
Kate Sanders
Adam Byerly
Jingyu Zhang
Benjamin Van Durme
Daniel Khashabi
LLMAG
60
6
0
18 Mar 2024
1