ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.03797
  4. Cited By
NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls

NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls

4 September 2024
Kinjal Basu
Ibrahim Abdelaziz
Kelsey Bradford
M. Crouse
Kiran Kate
Sadhana Kumaravel
Saurabh Goyal
Asim Munawar
Yara Rizk
Xin Wang
Luis A. Lastras
Pavan Kapanipathi
Pavan Kapanipathi
ArXivPDFHTML

Papers citing "NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls"

6 / 6 papers shown
Title
FEABench: Evaluating Language Models on Multiphysics Reasoning Ability
FEABench: Evaluating Language Models on Multiphysics Reasoning Ability
N. Mudur
Hao Cui
Subhashini Venugopalan
Paul Raccuglia
M. Brenner
Peter C. Norgaard
LLMAG
ELM
LRM
33
0
0
08 Apr 2025
Generating Structured Plan Representation of Procedures with LLMs
Generating Structured Plan Representation of Procedures with LLMs
Deepeka Garg
Sihan Zeng
Sumitra Ganesh
Leo Ardon
28
0
0
28 Mar 2025
Survey on Evaluation of LLM-based Agents
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
Presented at ResearchTrend Connect | LLMAG on 07 May 2025
93
5
0
20 Mar 2025
Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation
Fan Yin
Zifeng Wang
I-Hung Hsu
Jun Yan
Ke Jiang
...
L. Le
Kai-Wei Chang
Chen-Yu Lee
Hamid Palangi
Tomas Pfister
39
2
0
10 Mar 2025
HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios
HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios
Jun Wang
Jiamu Zhou
Muning Wen
Xiaoyun Mo
H. Zhang
...
Cheng Jin
Xihuai Wang
Weinan Zhang
Qiuying Peng
J. Wang
LLMAG
87
0
0
21 Dec 2024
A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs
A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs
Myeongsoo Kim
Tyler Stennett
Saurabh Sinha
Alessandro Orso
29
4
0
11 Nov 2024
1