AutoEval Done Right: Using Synthetic Data for Model Evaluation

AutoEval Done Right: Using Synthetic Data for Model Evaluation

9 March 2024

Anastasios Nikolas Angelopoulos

Michael I. Jordan

Papers citing "AutoEval Done Right: Using Synthetic Data for Model Evaluation"

13 / 13 papers shown

Title
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs G. Wang Z. Chen Bo Li Haifeng Xu 85 0 0 02 May 2025
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels Luke M. Guerdan Solon Barocas Kenneth Holstein Hanna M. Wallach Zhiwei Steven Wu Alexandra Chouldechova ALM ELM 171 0 0 13 Mar 2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback Zhaoyi Zhou Yuda Song Andrea Zanette ALM 68 0 0 14 Feb 2025
Evaluation of Large Language Models via Coupled Token Generation N. C. Benz Stratis Tsirtsis Eleni Straitouri Ivi Chatzi Ander Artola Velasco Suhas Thejaswi Manuel Gomez Rodriguez 46 0 0 03 Feb 2025
Auto-Evaluation with Few Labels through Post-hoc Regression Benjamin Eyre David Madras 69 1 0 19 Nov 2024
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data Florian E. Dorner Vivian Y. Nastl Moritz Hardt ELM ALM 35 5 0 17 Oct 2024
ChainBuddy: An AI Agent System for Generating LLM Pipelines Jingyue Zhang Ian Arawjo LLMAG 22 3 0 20 Sep 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions? Kristina Gligorić Tijana Zrnic Cinoo Lee Emmanuel J. Candès Dan Jurafsky 66 5 0 27 Aug 2024
AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews Keith Tyser Ben Segev Gaston Longhitano Xin-Yu Zhang Zachary Meeks ... Nicholas Belsten A. Shporer Madeleine Udell Dov Te’eni Iddo Drori 27 13 0 19 Aug 2024
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation Adam Fisch Joshua Maynez R. A. Hofer Bhuwan Dhingra Amir Globerson William W. Cohen 36 7 0 06 Jun 2024
A Note on the Prediction-Powered Bootstrap Tijana Zrnic 22 3 0 28 May 2024
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences Shreya Shankar J.D. Zamfirescu-Pereira Bjorn Hartmann Aditya G. Parameswaran Ian Arawjo ALM 32 84 0 18 Apr 2024
Prediction-Powered Ranking of Large Language Models Ivi Chatzi Eleni Straitouri Suhas Thejaswi Manuel Gomez Rodriguez ALM 29 5 0 27 Feb 2024