ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07714
  4. Cited By
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models

StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models

12 March 2024
Zhicheng Guo
Sijie Cheng
Hao Wang
Shihao Liang
Yujia Qin
Peng Li
Zhiyuan Liu
Maosong Sun
Yang Janet Liu
    ELM
ArXivPDFHTML

Papers citing "StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models"

22 / 22 papers shown
Title
GenTorrent: Scaling Large Language Model Serving with An Overley Network
GenTorrent: Scaling Large Language Model Serving with An Overley Network
Fei Fang
Yifan Hua
Shengze Wang
Ruilin Zhou
Y. Liu
Chen Qian
X. Zhang
41
0
0
27 Apr 2025
Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation
Fan Yin
Zifeng Wang
I-Hung Hsu
Jun Yan
Ke Jiang
...
L. Le
Kai-Wei Chang
Chen-Yu Lee
Hamid Palangi
Tomas Pfister
39
2
0
10 Mar 2025
ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models
Jeonghoon Shim
Gyuhyeon Seo
Cheongsu Lim
Yohan Jo
28
4
0
01 Mar 2025
Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model
  Fine-tuning
Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning
Ziang Ye
Z. Zhang
Yang Zhang
Jianxin Ma
Junyang Lin
Fuli Feng
LRM
72
0
0
19 Dec 2024
SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs
SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs
Shirley Kokane
Ming Zhu
Tulika Awalgaonkar
Jianguo Zhang
Thai Hoang
...
Juan Carlos Niebles
Huan Wang
Shelby Heinecke
Caiming Xiong
Silivo Savarese
LLMAG
88
0
0
20 Nov 2024
Library Learning Doesn't: The Curious Case of the Single-Use "Library"
Library Learning Doesn't: The Curious Case of the Single-Use "Library"
Ian Berlot-Attwell
Frank Rudzicz
Xujie Si
32
1
0
26 Oct 2024
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
Mingyang Chen
Haoze Sun
Tianpeng Li
Fan Yang
Hao Liang
Keer Lu
Bin Cui
Wentao Zhang
Zenan Zhou
Weipeng Chen
LRM
36
5
0
16 Oct 2024
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for
  Embodied AI
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI
Sijie Cheng
Kechen Fang
Yangyang Yu
Sicheng Zhou
B. Li
Ye Tian
Tingguang Li
Lei Han
Yang Janet Liu
26
8
0
15 Oct 2024
Learning Evolving Tools for Large Language Models
Learning Evolving Tools for Large Language Models
Guoxin Chen
Zhong Zhang
Xin Cong
Fangda Guo
Yesai Wu
Yankai Lin
Wenzheng Feng
Yasheng Wang
KELM
44
1
0
09 Oct 2024
ToolGen: Unified Tool Retrieval and Calling via Generation
ToolGen: Unified Tool Retrieval and Calling via Generation
Renxi Wang
Xudong Han
Lei Ji
Shu Wang
Timothy Baldwin
Haonan Li
LLMAG
43
6
0
04 Oct 2024
SEAL: Suite for Evaluating API-use of LLMs
SEAL: Suite for Evaluating API-use of LLMs
Woojeong Kim
Ashish Jagmohan
Aditya Vempaty
ELM
ALM
LLMAG
25
0
0
23 Sep 2024
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains
Guoli Yin
Haoping Bai
Shuang Ma
Feng Nan
Yanchao Sun
...
Xiaoming Wang
Jiulong Shan
Meng Cao
Ruoming Pang
Zirui Wang
LLMAG
ELM
29
1
0
18 Jul 2024
Revolutionizing Bridge Operation and maintenance with LLM-based Agents:
  An Overview of Applications and Insights
Revolutionizing Bridge Operation and maintenance with LLM-based Agents: An Overview of Applications and Insights
Xinyu-Chen
Lianzhen-Zhang
LLMAG
AI4CE
30
1
0
14 Jul 2024
Speech-Copilot: Leveraging Large Language Models for Speech Processing
  via Task Decomposition, Modularization, and Program Generation
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
Chun-Yi Kuan
Chih-Kai Yang
Wei-Ping Huang
Ke-Han Lu
Hung-yi Lee
31
5
0
13 Jul 2024
APIGen: Automated Pipeline for Generating Verifiable and Diverse
  Function-Calling Datasets
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
Zuxin Liu
Thai Hoang
Jianguo Zhang
Ming Zhu
Tian Lan
...
Silvio Savarese
Juan Carlos Niebles
Huan Wang
Shelby Heinecke
Caiming Xiong
40
32
0
26 Jun 2024
Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?
Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?
Seungbin Yang
chaeHun Park
Taehee Kim
Jaegul Choo
41
2
0
18 Jun 2024
Tool Learning with Large Language Models: A Survey
Tool Learning with Large Language Models: A Survey
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Jirong Wen
LLMAG
25
77
0
28 May 2024
Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents
Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents
Zhengliang Shi
Shen Gao
Xiuyi Chen
Yue Feng
Lingyong Yan
Haibo Shi
Dawei Yin
Zhumin Chen
Suzan Verberne
LLMAG
39
14
0
26 May 2024
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
Vikranth Srivatsa
Zijian He
Reyna Abhyankar
Dongming Li
Yiying Zhang
32
13
0
08 May 2024
FireAct: Toward Language Agent Fine-tuning
FireAct: Toward Language Agent Fine-tuning
Baian Chen
Chang Shu
Ehsan Shareghi
Nigel Collier
Karthik Narasimhan
Shunyu Yao
ALM
LLMAG
96
96
0
09 Oct 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
  Feedback
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
120
137
0
19 Sep 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
208
2,413
0
06 Oct 2022
1