ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.14033
  4. Cited By
T-Eval: Evaluating the Tool Utilization Capability of Large Language
  Models Step by Step

T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step

21 December 2023
Zehui Chen
Weihua Du
Wenwei Zhang
Kuikun Liu
Jiangning Liu
Miao Zheng
Jingming Zhuo
Songyang Zhang
Dahua Lin
Kai-xiang Chen
Feng Zhao
    LLMAG
    ELM
ArXivPDFHTML

Papers citing "T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step"

12 / 12 papers shown
Title
Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant
Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant
Lei Shen
Xiaoyu Shen
50
0
0
25 Apr 2025
How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent Tasks
How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent Tasks
Yusen Wu
Junwu Xiong
Xiaotie Deng
LLMAG
36
0
0
04 Apr 2025
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Peijie Yu
Yifan Yang
J. Li
Zelong Zhang
Haorui Wang
Xiao Feng
Feng Zhang
LLMAG
101
0
0
03 Apr 2025
LocAgent: Graph-Guided LLM Agents for Code Localization
LocAgent: Graph-Guided LLM Agents for Code Localization
Zhaoling Chen
Xiangru Tang
Gangda Deng
Fang Wu
Jialong Wu
Zhiwei Jiang
Viktor Prasanna
Arman Cohan
Xingyao Wang
LLMAG
89
2
0
12 Mar 2025
Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis
Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis
Lin Yuan
Jun Xu
Honghao Gui
Mengshu Sun
Zhiqiang Zhang
Lei Liang
Jun Zhou
AI4CE
110
0
0
06 Feb 2025
MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification
MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification
Saptarshi Sengupta
Kristal Curtis
Akshay Mallipeddi
Abhinav Mathur
Joseph Ross
Liang Gou
Liang Gou
LLMAG
SyDa
100
1
0
28 Nov 2024
Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models
Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models
Yi-Fan Lu
Xian-Ling Mao
Tian Lan
Heyan Huang
Heyan Huang
Xiaoyan Gao
45
0
0
12 Oct 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
42
10
0
12 Jun 2024
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
223
2,413
0
06 Oct 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
207
1,089
0
20 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1