Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.10934
Cited By
Agent-as-a-Judge: Evaluate Agents with Agents
14 October 2024
Mingchen Zhuge
Changsheng Zhao
Dylan R. Ashley
Wenyi Wang
Dmitrii Khizbullin
Yunyang Xiong
Zechun Liu
E. Chang
Raghuraman Krishnamoorthi
Yuandong Tian
Yangyang Shi
Vikas Chandra
Jürgen Schmidhuber
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Agent-as-a-Judge: Evaluate Agents with Agents"
7 / 7 papers shown
Title
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Bang Zhang
Ruotian Ma
Qingxuan Jiang
Peisong Wang
Jiaqi Chen
...
Fanghua Ye
Jian Li
Yifan Yang
Zhaopeng Tu
Xiaolong Li
LLMAG
ELM
ALM
86
25
1
01 May 2025
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Shaokun Zhang
Ming Yin
Jieyu Zhang
J. H. Liu
Zhiguang Han
...
Beibin Li
Chi Wang
H. Wang
Y. Chen
Qingyun Wu
47
0
0
30 Apr 2025
ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition
H. A. Alyahya
Haidar Khan
Yazeed Alnumay
M Saiful Bari
B. Yener
LRM
47
0
0
10 Mar 2025
WildIFEval: Instruction Following in the Wild
Gili Lior
Asaf Yehudai
Ariel Gera
L. Ein-Dor
62
0
0
09 Mar 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
99
61
0
25 Nov 2024
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Jun Shern Chan
Neil Chowdhury
Oliver Jaffe
James Aung
Dane Sherburn
...
Kevin Liu
Leon Maksin
Tejal Patwardhan
Lilian Weng
Aleksander Mądry
ELM
LLMAG
49
46
0
09 Oct 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
40
0
0
04 Jul 2024
1