Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.07008
Cited By
AutoEval Done Right: Using Synthetic Data for Model Evaluation
9 March 2024
Pierre Boyeau
Anastasios Nikolas Angelopoulos
N. Yosef
Jitendra Malik
Michael I. Jordan
SyDa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AutoEval Done Right: Using Synthetic Data for Model Evaluation"
13 / 13 papers shown
Title
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
G. Wang
Z. Chen
Bo Li
Haifeng Xu
85
0
0
02 May 2025
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels
Luke M. Guerdan
Solon Barocas
Kenneth Holstein
Hanna M. Wallach
Zhiwei Steven Wu
Alexandra Chouldechova
ALM
ELM
171
0
0
13 Mar 2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Zhaoyi Zhou
Yuda Song
Andrea Zanette
ALM
68
0
0
14 Feb 2025
Evaluation of Large Language Models via Coupled Token Generation
N. C. Benz
Stratis Tsirtsis
Eleni Straitouri
Ivi Chatzi
Ander Artola Velasco
Suhas Thejaswi
Manuel Gomez Rodriguez
46
0
0
03 Feb 2025
Auto-Evaluation with Few Labels through Post-hoc Regression
Benjamin Eyre
David Madras
69
1
0
19 Nov 2024
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Florian E. Dorner
Vivian Y. Nastl
Moritz Hardt
ELM
ALM
35
5
0
17 Oct 2024
ChainBuddy: An AI Agent System for Generating LLM Pipelines
Jingyue Zhang
Ian Arawjo
LLMAG
22
3
0
20 Sep 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligorić
Tijana Zrnic
Cinoo Lee
Emmanuel J. Candès
Dan Jurafsky
66
5
0
27 Aug 2024
AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews
Keith Tyser
Ben Segev
Gaston Longhitano
Xin-Yu Zhang
Zachary Meeks
...
Nicholas Belsten
A. Shporer
Madeleine Udell
Dov Te’eni
Iddo Drori
27
13
0
19 Aug 2024
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
Adam Fisch
Joshua Maynez
R. A. Hofer
Bhuwan Dhingra
Amir Globerson
William W. Cohen
36
7
0
06 Jun 2024
A Note on the Prediction-Powered Bootstrap
Tijana Zrnic
22
3
0
28 May 2024
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Shreya Shankar
J.D. Zamfirescu-Pereira
Bjorn Hartmann
Aditya G. Parameswaran
Ian Arawjo
ALM
32
84
0
18 Apr 2024
Prediction-Powered Ranking of Large Language Models
Ivi Chatzi
Eleni Straitouri
Suhas Thejaswi
Manuel Gomez Rodriguez
ALM
29
5
0
27 Feb 2024
1