Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.12045
Cited By
τ
τ
τ
-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
17 June 2024
Shunyu Yao
Noah Shinn
P. Razavi
Karthik Narasimhan
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains"
15 / 15 papers shown
Title
TRAIL: Trace Reasoning and Agentic Issue Localization
Darshan Deshpande
Varun Gangal
Hersh Mehta
Jitin Krishnan
Anand Kannappan
Rebecca Qian
9
0
0
13 May 2025
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Joykirat Singh
Raghav Magazine
Yash Pandya
A. Nambi
LLMAG
KELM
OffRL
LRM
45
0
0
28 Apr 2025
When2Call: When (not) to Call Tools
Hayley Ross
Ameya Sunil Mahabaleshwarkar
Yoshi Suhara
92
0
0
26 Apr 2025
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Jasper Götting
Pedro Medeiros
Jon G Sanders
Nathaniel Li
Long Phan
Karam Elabd
Lennart Justen
Dan Hendrycks
Seth Donoughe
ELM
49
2
0
21 Apr 2025
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines
Tengjun Jin
Yuxuan Zhu
Daniel Kang
LMTD
ELM
45
0
0
07 Apr 2025
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
Akshara Prabhakar
Z. Liu
Weiran Yao
Jianguo Zhang
Ming Zhu
...
Juan Carlos Niebles
Shelby Heinecke
H. Wang
S.
Caiming Xiong
VGen
77
1
0
04 Apr 2025
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Peijie Yu
Yifan Yang
J. Li
Zelong Zhang
Haorui Wang
Xiao Feng
Feng Zhang
LLMAG
97
0
0
03 Apr 2025
AgentStudio: A Toolkit for Building General Virtual Agents
Longtao Zheng
Zhiyuan Huang
Zhenghai Xue
Xinrun Wang
Bo An
Shuicheng Yan
75
14
0
17 Feb 2025
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Zachary S. Siegel
Sayash Kapoor
Nitya Nagdir
Benedikt Stroebl
Arvind Narayanan
27
8
0
17 Sep 2024
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
Jiarui Lu
Thomas Holleis
Yizhe Zhang
Bernhard Aumayer
Feng Nan
...
Shen Ma
Mengyu Li
Guoli Yin
Zirui Wang
Ruoming Pang
LLMAG
ELM
31
28
0
08 Aug 2024
AI Agents That Matter
Sayash Kapoor
Benedikt Stroebl
Zachary S. Siegel
Nitya Nadgir
Arvind Narayanan
35
32
0
01 Jul 2024
USimAgent: Large Language Models for Simulating Search Users
Erhan Zhang
Xingzhu Wang
Peiyuan Gong
Yankai Lin
Jiaxin Mao
LLMAG
33
14
0
14 Mar 2024
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
209
1,701
0
07 Apr 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
208
2,413
0
06 Oct 2022
Task-Oriented Dialogue as Dataflow Synthesis
Semantic Machines
Jacob Andreas
J. Bufe
David Burkett
Charles C. Chen
...
Izabela Witoszko
Jason Wolfe
A. Wray
Yuchen Zhang
Alexander Zotov
AIFin
180
151
0
24 Sep 2020
1