Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.18370
Cited By
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
25 July 2024
Jaehun Jung
Faeze Brahman
Yejin Choi
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement"
8 / 8 papers shown
Title
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels
Luke M. Guerdan
Solon Barocas
Kenneth Holstein
Hanna M. Wallach
Zhiwei Steven Wu
Alexandra Chouldechova
ALM
ELM
120
0
0
13 Mar 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
106
61
0
25 Nov 2024
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Florian E. Dorner
Vivian Y. Nastl
Moritz Hardt
ELM
ALM
33
5
0
17 Oct 2024
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
Adam Fisch
Joshua Maynez
R. A. Hofer
Bhuwan Dhingra
Amir Globerson
William W. Cohen
34
7
0
06 Jun 2024
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu
Xinggang Wang
Xinlong Wang
ELM
ALM
54
103
0
26 Oct 2023
Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control
Anastasios Nikolas Angelopoulos
Stephen Bates
Emmanuel J. Candès
Michael I. Jordan
Lihua Lei
92
125
0
03 Oct 2021
Distribution-Free, Risk-Controlling Prediction Sets
Stephen Bates
Anastasios Nikolas Angelopoulos
Lihua Lei
Jitendra Malik
Michael I. Jordan
OOD
173
184
0
07 Jan 2021
1