AgentQuest: A Modular Benchmark Framework to Measure Progress and
Improve LLM Agents

AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents

North American Chapter of the Association for Computational Linguistics (NAACL), 2024

9 April 2024

Luca Gioacchini

Kiril Gashteovski

Roberto Bifulco

Carolin (Haas) Lawrence

ArXiv (abs)PDF HTML Github (26★)

Papers citing "AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents"

8 / 8 papers shown

Title
$COCORELI: Cooperative, Compositional Reconstitution \& Execution of Language Instructions$ COCORELI: Cooperative, Compositional Reconstitution \& Execution of Language Instructions Swarnadeep Bhar Omar Naim Eleni Metheniti Bastien Navarri Loïc Cabannes Morteza Ezzabady Nicholas Asher LLMAG LRM 28 0 0 29 Aug 2025
Evaluation and Benchmarking of LLM Agents: A Survey Mahmoud Mohammadi Yipeng Li Jane Lo Wendy Yip LLMAG ELM 88 16 0 29 Jul 2025
MAPS: A Multilingual Benchmark for Global Agent Performance and Security Omer Hofman Jonathan Brokman Oren Rachmil Shamik Bose Vikas Pahuja Toshiya Shimizu Trisha Starostina Kelly Marchisio Seraphina Goldfarb-Tarrant Roman Vainshtein 152 1 0 21 May 2025
Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents Mrinal Rawat Ambuje Gupta Rushil Goomer Alessandro Di Bari Neha Gupta Roberto Pieraccini LLMAG LRM 217 2 0 15 May 2025
MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential DiagnosisAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Daniel Rose Chia-Chien Hung Marco Lepri Israa Alqassem Kiril Gashteovski Carolin (Haas) Lawrence LM&MA 281 6 0 26 Feb 2025
Towards Effective GenAI Multi-Agent Collaboration: Design and Evaluation for Enterprise Applications Raphael Shu Nilaksh Das Michelle Yuan Monica Sunkara Yi Zhang LLMAG 240 10 0 06 Dec 2024
What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering Federico Errica G. Siracusano D. Sanvito Roberto Bifulco 310 62 0 18 Jun 2024
Toward a Team of AI-made Scientists for Scientific Discovery from Gene Expression Data Haoyang Liu Yijiang Li Jinglin Jian Yuxuan Cheng Jianrong Lu Shuyi Guo Jinglei Zhu Mianchen Zhang Miantong Zhang Haohan Wang 167 8 0 15 Feb 2024

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.