ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.13304
  4. Cited By
ToolQA: A Dataset for LLM Question Answering with External Tools

ToolQA: A Dataset for LLM Question Answering with External Tools

23 June 2023
Yuchen Zhuang
Yue Yu
Kuan-Chieh Jackson Wang
Haotian Sun
Chao Zhang
    ELM
    LLMAG
ArXivPDFHTML

Papers citing "ToolQA: A Dataset for LLM Question Answering with External Tools"

40 / 40 papers shown
Title
TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure Handlers
TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure Handlers
Aiyao He
Sijia Cui
Shuai Xu
Yanna Wang
Bo Xu
29
0
0
13 May 2025
When2Call: When (not) to Call Tools
When2Call: When (not) to Call Tools
Hayley Ross
Ameya Sunil Mahabaleshwarkar
Yoshi Suhara
92
0
0
26 Apr 2025
Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant
Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant
Lei Shen
Xiaoyu Shen
56
0
0
25 Apr 2025
OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching
OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching
Zhangcheng Qiang
Kerry Taylor
Weiqing Wang
Jing Jiang
52
0
0
25 Mar 2025
Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity
Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity
Yupu Hao
Pengfei Cao
Zhuoran Jin
Huanxuan Liao
Yubo Chen
Kang Liu
Jun Zhao
LLMAG
75
1
0
02 Mar 2025
Generative Artificial Intelligence: Evolving Technology, Growing Societal Impact, and Opportunities for Information Systems Research
Veda C. Storey
Wei Thoo Yue
J. Leon Zhao
Roman Lukyanenko
43
0
0
25 Feb 2025
Grounding LLM Reasoning with Knowledge Graphs
Grounding LLM Reasoning with Knowledge Graphs
Alfonso Amayuelas
Joy Prakash Sain
Simerjot Kaur
Charese Smiley
77
0
0
18 Feb 2025
MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling
MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling
Yakun Zhu
Shaohang Wei
Xu Wang
Kui Xue
Xiaofan Zhang
S. Zhang
54
1
0
17 Feb 2025
Learning Musical Representations for Music Performance Question Answering
Xingjian Diao
Chunhui Zhang
Tingxuan Wu
Ming Cheng
Z. Ouyang
Weiyi Wu
Jiang Gui
62
5
0
10 Feb 2025
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu
J. Wang
Yuan Meng
Yanning Zhang
Le Sun
Zhi Wang
129
0
0
25 Nov 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
73
0
0
12 Nov 2024
Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations
  Benchmark for Better Human-Machine Comparison
Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison
Shiyu Hu
Xuchen Li
X. Li
Jing Zhang
Yipei Wang
Xin Zhao
Kang Hao Cheong
VLM
26
1
0
20 Oct 2024
Learning Evolving Tools for Large Language Models
Learning Evolving Tools for Large Language Models
Guoxin Chen
Zhong Zhang
Xin Cong
Fangda Guo
Yesai Wu
Yankai Lin
Wenzheng Feng
Yasheng Wang
KELM
52
1
0
09 Oct 2024
Can Watermarked LLMs be Identified by Users via Crafted Prompts?
Can Watermarked LLMs be Identified by Users via Crafted Prompts?
Aiwei Liu
Sheng Guan
Y. Liu
L. Pan
Yifei Zhang
Liancheng Fang
Lijie Wen
Philip S. Yu
Xuming Hu
WaLM
89
2
0
04 Oct 2024
LLM With Tools: A Survey
LLM With Tools: A Survey
Zhuocheng Shen
36
9
0
24 Sep 2024
Automated test generation to evaluate tool-augmented LLMs as
  conversational AI agents
Automated test generation to evaluate tool-augmented LLMs as conversational AI agents
Samuel Arcadinho
David Aparicio
Mariana Almeida
29
5
0
24 Sep 2024
Learning to Ask: When LLM Agents Meet Unclear Instruction
Learning to Ask: When LLM Agents Meet Unclear Instruction
Wenxuan Wang
Juluan Shi
Chaozheng Wang
Cheryl Lee
Chaozheng Wang
Cheryl Lee
Youliang Yuan
Jen-tse Huang
Wenxiang Jiao
Michael R. Lyu
LLMAG
24
8
0
31 Aug 2024
Simulating Financial Market via Large Language Model based Agents
Simulating Financial Market via Large Language Model based Agents
Shen Gao
Yuntao Wen
Minghang Zhu
Jianing Wei
Yuhan Cheng
Qunzi Zhang
Shuo Shang
AIFin
29
11
0
28 Jun 2024
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents
Haiyang Shen
Yue Li
Desong Meng
Dongqi Cai
Sheng Qi
Li Zhang
Mengwei Xu
Yun Ma
LLMAG
29
9
0
28 Jun 2024
CancerLLM: A Large Language Model in Cancer Domain
CancerLLM: A Large Language Model in Cancer Domain
Mingchen Li
Jiatan Huang
Jeremy Yeung
A. Blaes
Steven Johnson
Hongfang Liu
Hua Xu
Rui Zhang
ELM
LM&MA
32
4
0
15 Jun 2024
Transforming Wearable Data into Health Insights using Large Language
  Model Agents
Transforming Wearable Data into Health Insights using Large Language Model Agents
Mike A. Merrill
Akshay Paruchuri
Naghmeh Rezaei
Geza Kovacs
Javier Perez
...
Shwetak Patel
Jiening Zhan
Tim Althoff
Daniel J. McDuff
Xin Liu
LM&MA
LLMAG
AI4CE
35
8
0
10 Jun 2024
HYDRA: Model Factorization Framework for Black-Box LLM Personalization
HYDRA: Model Factorization Framework for Black-Box LLM Personalization
Yuchen Zhuang
Haotian Sun
Yue Yu
Rushi Qiang
Qifan Wang
Chao Zhang
Bo Dai
AAML
33
14
0
05 Jun 2024
Evalverse: Unified and Accessible Library for Large Language Model
  Evaluation
Evalverse: Unified and Accessible Library for Large Language Model Evaluation
Jihoo Kim
Wonho Song
Dahyun Kim
Yunsu Kim
Yungi Kim
Chanjun Park
ELM
61
3
0
01 Apr 2024
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation
  Benchmark for Chinese Large Language Models
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Chuang Liu
Renren Jin
Yuqi Ren
Deyi Xiong
ELM
14
0
0
19 Mar 2024
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Zhicheng Guo
Sijie Cheng
Hao Wang
Shihao Liang
Yujia Qin
Peng Li
Zhiyuan Liu
Maosong Sun
Yang Janet Liu
ELM
44
22
0
12 Mar 2024
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Juyeon Kim
Jeongeun Lee
Yoonho Chang
Chanyeol Choi
Junseong Kim
Jy-yong Sohn
KELM
LRM
44
2
0
27 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
117
353
0
09 Feb 2024
Bringing Generative AI to Adaptive Learning in Education
Bringing Generative AI to Adaptive Learning in Education
Hang Li
Tianlong Xu
Chaoli Zhang
Eason Chen
Jing Liang
Xing Fan
Haoyang Li
Jiliang Tang
Qingsong Wen
40
20
0
02 Feb 2024
RE-GAINS & EnChAnT: Intelligent Tool Manipulation Systems For Enhanced
  Query Responses
RE-GAINS & EnChAnT: Intelligent Tool Manipulation Systems For Enhanced Query Responses
Sahil Girhepuje
Siva Sankar Sajeev
Purvam Jain
Arya Sikder
Adithya Rama Varma
Ryan George
Akshay Govind Srinivasan
Mahendra Kurup
Ashmit Sinha
Sudip Mondal
RALM
27
0
0
28 Jan 2024
Large Language Models Can Learn Temporal Reasoning
Large Language Models Can Learn Temporal Reasoning
Siheng Xiong
Ali Payani
Ramana Rao Kompella
Faramarz Fekri
LRM
27
73
0
12 Jan 2024
LLM-SQL-Solver: Can LLMs Determine SQL Equivalence?
LLM-SQL-Solver: Can LLMs Determine SQL Equivalence?
Fuheng Zhao
Lawrence Lim
Ishtiyaque Ahmad
D. Agrawal
A. El Abbadi
Amr El Abbadi
54
9
0
16 Dec 2023
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world
  APIs
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Yujia Qin
Shi Liang
Yining Ye
Kunlun Zhu
Lan Yan
...
Jie Zhou
Mark B. Gerstein
Dahai Li
Zhiyuan Liu
Maosong Sun
CLL
ALM
LLMAG
ELM
LM&MA
53
612
0
31 Jul 2023
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large
  Language Models in Knowledge Conflicts
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
Jian Xie
Kai Zhang
Jiangjie Chen
Renze Lou
Yu-Chuan Su
RALM
198
152
0
22 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
230
2,989
0
22 Mar 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
233
2,470
0
06 Oct 2022
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango
Aman Madaan
Amir Yazdanbakhsh
LRM
141
116
0
16 Sep 2022
Is a Question Decomposition Unit All We Need?
Is a Question Decomposition Unit All We Need?
Pruthvi H. Patel
Swaroop Mishra
Mihir Parmar
Chitta Baral
ReLM
140
51
0
25 May 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
4,048
0
24 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,402
0
28 Jan 2022
1