Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.03797
Cited By
NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls
4 September 2024
Kinjal Basu
Ibrahim Abdelaziz
Kelsey Bradford
M. Crouse
Kiran Kate
Sadhana Kumaravel
Saurabh Goyal
Asim Munawar
Yara Rizk
Xin Wang
Luis A. Lastras
Pavan Kapanipathi
Pavan Kapanipathi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls"
6 / 6 papers shown
Title
FEABench: Evaluating Language Models on Multiphysics Reasoning Ability
N. Mudur
Hao Cui
Subhashini Venugopalan
Paul Raccuglia
M. Brenner
Peter C. Norgaard
LLMAG
ELM
LRM
33
0
0
08 Apr 2025
Generating Structured Plan Representation of Procedures with LLMs
Deepeka Garg
Sihan Zeng
Sumitra Ganesh
Leo Ardon
28
0
0
28 Mar 2025
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
Presented at
ResearchTrend Connect | LLMAG
on
07 May 2025
93
5
0
20 Mar 2025
Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation
Fan Yin
Zifeng Wang
I-Hung Hsu
Jun Yan
Ke Jiang
...
L. Le
Kai-Wei Chang
Chen-Yu Lee
Hamid Palangi
Tomas Pfister
39
2
0
10 Mar 2025
HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios
Jun Wang
Jiamu Zhou
Muning Wen
Xiaoyun Mo
H. Zhang
...
Cheng Jin
Xihuai Wang
Weinan Zhang
Qiuying Peng
J. Wang
LLMAG
87
0
0
21 Dec 2024
A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs
Myeongsoo Kim
Tyler Stennett
Saurabh Sinha
Alessandro Orso
29
4
0
11 Nov 2024
1