Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.13341
Cited By
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
17 October 2024
Florian E. Dorner
Vivian Y. Nastl
Moritz Hardt
ELM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data"
4 / 4 papers shown
Title
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels
Luke M. Guerdan
Solon Barocas
Kenneth Holstein
Hanna M. Wallach
Zhiwei Steven Wu
Alexandra Chouldechova
ALM
ELM
108
0
0
13 Mar 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
64
0
0
24 Feb 2025
Evaluation of Large Language Models via Coupled Token Generation
N. C. Benz
Stratis Tsirtsis
Eleni Straitouri
Ivi Chatzi
Ander Artola Velasco
Suhas Thejaswi
Manuel Gomez Rodriguez
36
0
0
03 Feb 2025
JuStRank: Benchmarking LLM Judges for System Ranking
Ariel Gera
Odellia Boni
Yotam Perlitz
Roy Bar-Haim
Lilach Eden
Asaf Yehudai
ALM
ELM
90
2
0
12 Dec 2024
1