Trust or Escalate: LLM Judges with Provable Guarantees for Human
Agreement

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

25 July 2024

Faeze Brahman

Yejin Choi

Papers citing "Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement"

8 / 8 papers shown

Title
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks Yixin Cao Shibo Hong X. Li Jiahao Ying Yubo Ma ... Juanzi Li Aixin Sun Xuanjing Huang Tat-Seng Chua Yu Jiang ALM ELM 84 0 0 26 Apr 2025
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels Luke M. Guerdan Solon Barocas Kenneth Holstein Hanna M. Wallach Zhiwei Steven Wu Alexandra Chouldechova ALM ELM 120 0 0 13 Mar 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge Dawei Li Bohan Jiang Liangjie Huang Alimohammad Beigi Chengshuai Zhao ... Canyu Chen Tianhao Wu Kai Shu Lu Cheng Huan Liu ELM AILaw 106 61 0 25 Nov 2024
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data Florian E. Dorner Vivian Y. Nastl Moritz Hardt ELM ALM 33 5 0 17 Oct 2024
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation Adam Fisch Joshua Maynez R. A. Hofer Bhuwan Dhingra Amir Globerson William W. Cohen 34 7 0 06 Jun 2024
JudgeLM: Fine-tuned Large Language Models are Scalable Judges Lianghui Zhu Xinggang Wang Xinlong Wang ELM ALM 54 103 0 26 Oct 2023
Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control Anastasios Nikolas Angelopoulos Stephen Bates Emmanuel J. Candès Michael I. Jordan Lihua Lei 92 125 0 03 Oct 2021
Distribution-Free, Risk-Controlling Prediction Sets Stephen Bates Anastasios Nikolas Angelopoulos Lihua Lei Jitendra Malik Michael I. Jordan OOD 173 184 0 07 Jan 2021