ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.10934
  4. Cited By
Agent-as-a-Judge: Evaluate Agents with Agents

Agent-as-a-Judge: Evaluate Agents with Agents

14 October 2024
Mingchen Zhuge
Changsheng Zhao
Dylan R. Ashley
Wenyi Wang
Dmitrii Khizbullin
Yunyang Xiong
Zechun Liu
E. Chang
Raghuraman Krishnamoorthi
Yuandong Tian
Yangyang Shi
Vikas Chandra
Jürgen Schmidhuber
    ELM
ArXivPDFHTML

Papers citing "Agent-as-a-Judge: Evaluate Agents with Agents"

7 / 7 papers shown
Title
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Bang Zhang
Ruotian Ma
Qingxuan Jiang
Peisong Wang
Jiaqi Chen
...
Fanghua Ye
Jian Li
Yifan Yang
Zhaopeng Tu
Xiaolong Li
LLMAG
ELM
ALM
81
25
1
01 May 2025
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Shaokun Zhang
Ming Yin
Jieyu Zhang
J. H. Liu
Zhiguang Han
...
Beibin Li
Chi Wang
H. Wang
Y. Chen
Qingyun Wu
47
0
0
30 Apr 2025
ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition
ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition
H. A. Alyahya
Haidar Khan
Yazeed Alnumay
M Saiful Bari
B. Yener
LRM
42
0
0
10 Mar 2025
WildIFEval: Instruction Following in the Wild
Gili Lior
Asaf Yehudai
Ariel Gera
L. Ein-Dor
59
0
0
09 Mar 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
96
61
0
25 Nov 2024
MLE-bench: Evaluating Machine Learning Agents on Machine Learning
  Engineering
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Jun Shern Chan
Neil Chowdhury
Oliver Jaffe
James Aung
Dane Sherburn
...
Kevin Liu
Leon Maksin
Tejal Patwardhan
Lilian Weng
Aleksander Mądry
ELM
LLMAG
47
46
0
09 Oct 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
34
0
0
04 Jul 2024
1