ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07008
  4. Cited By
AutoEval Done Right: Using Synthetic Data for Model Evaluation
v1v2 (latest)

AutoEval Done Right: Using Synthetic Data for Model Evaluation

9 March 2024
Pierre Boyeau
Anastasios Nikolas Angelopoulos
N. Yosef
Jitendra Malik
Michael I. Jordan
    SyDa
ArXiv (abs)PDFHTML

Papers citing "AutoEval Done Right: Using Synthetic Data for Model Evaluation"

25 / 25 papers shown
Title
How to Correctly Report LLM-as-a-Judge Evaluations
How to Correctly Report LLM-as-a-Judge Evaluations
Chungpa Lee
Thomas Zeng
Jongwon Jeong
Jy-yong Sohn
Kangwook Lee
149
1
0
26 Nov 2025
Extending Prediction-Powered Inference through Conformal Prediction
Extending Prediction-Powered Inference through Conformal Prediction
Daniel Csillag
Pedro DallÁntonia
C. Struchiner
G. Goedert
121
0
0
17 Oct 2025
Statistical Inference Leveraging Synthetic Data with Distribution-Free Guarantees
Statistical Inference Leveraging Synthetic Data with Distribution-Free Guarantees
Meshi Bashari
Yonghoon Lee
Roy Maor Lotan
Edgar Dobriban
Yaniv Romano
SyDa
144
1
0
24 Sep 2025
Statistical Methods in Generative AI
Statistical Methods in Generative AI
Edgar Dobriban
257
3
0
08 Sep 2025
Towards a rigorous evaluation of RAG systems: the challenge of due diligence
Towards a rigorous evaluation of RAG systems: the challenge of due diligence
Grégoire Martinon
Alexandra Lorenzo de Brionne
Jérôme Bohard
Antoine Lojou
Damien Hervault
Nicolas Brunel
152
1
0
29 Jul 2025
Sim2Val: Leveraging Correlation Across Test Platforms for Variance-Reduced Metric Estimation
Sim2Val: Leveraging Correlation Across Test Platforms for Variance-Reduced Metric Estimation
Rachel Luo
Heng Yang
Michael Watson
Apoorva Sharma
Sushant Veer
Edward Schmerling
Marco Pavone
51
2
0
25 Jun 2025
Cost-Optimal Active AI Model Evaluation
Cost-Optimal Active AI Model Evaluation
Anastasios Nikolas Angelopoulos
Jacob Eisenstein
Jonathan Berant
Alekh Agarwal
Adam Fisch
153
2
0
09 Jun 2025
Data Swarms: Optimizable Generation of Synthetic Evaluation Data
Data Swarms: Optimizable Generation of Synthetic Evaluation Data
Shangbin Feng
Yike Wang
Weijia Shi
Yulia Tsvetkov
303
0
0
31 May 2025
GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning
GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Qingchen Yu
Zifan Zheng
Ding Chen
Simin Niu
Bo Tang
Feiyu Xiong
Zhiyu Li
ELMLRM
145
3
0
28 May 2025
No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference
No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference
P. Mani
Peng Xu
Zachary Chase Lipton
Michael Oberst
244
2
0
26 May 2025
Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees
Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees
Sangwoo Park
Matteo Zecchin
Osvaldo Simeone
154
2
0
24 May 2025
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
G. Wang
Zhiwen Chen
Bo Li
Haifeng Xu
849
2
0
02 May 2025
Validating LLM-as-a-Judge Systems under Rating Indeterminacy
Validating LLM-as-a-Judge Systems under Rating Indeterminacy
Luke M. Guerdan
Solon Barocas
Kenneth Holstein
Hanna M. Wallach
Zhiwei Steven Wu
Alexandra Chouldechova
ALMELM
1.1K
3
0
07 Mar 2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Zhaoyi Zhou
Yuda Song
Andrea Zanette
ALM
300
3
0
14 Feb 2025
Evaluation of Large Language Models via Coupled Token Generation
Evaluation of Large Language Models via Coupled Token Generation
N. C. Benz
Stratis Tsirtsis
Eleni Straitouri
Ivi Chatzi
Ander Artola Velasco
Suhas Thejaswi
Manuel Gomez Rodriguez
299
3
0
03 Feb 2025
Regression for the Mean: Auto-Evaluation and Inference with Few Labels through Post-hoc Regression
Regression for the Mean: Auto-Evaluation and Inference with Few Labels through Post-hoc Regression
Benjamin Eyre
David Madras
325
5
0
19 Nov 2024
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the dataInternational Conference on Learning Representations (ICLR), 2024
Florian E. Dorner
Vivian Y. Nastl
Moritz Hardt
ELMALM
340
20
0
17 Oct 2024
Language Model Preference Evaluation with Multiple Weak Evaluators
Language Model Preference Evaluation with Multiple Weak Evaluators
Zhengyu Hu
Jieyu Zhang
Zhihan Xiong
Alexander Ratner
Hui Xiong
Ranjay Krishna
339
10
0
14 Oct 2024
ChainBuddy: An AI Agent System for Generating LLM Pipelines
ChainBuddy: An AI Agent System for Generating LLM PipelinesInternational Conference on Human Factors in Computing Systems (CHI), 2024
Jingyue Zhang
Ian Arawjo
LLMAG
171
0
0
20 Sep 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Can Unconfident LLM Annotations Be Used for Confident Conclusions?North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Kristina Gligorić
Tijana Zrnic
Cinoo Lee
Emmanuel J. Candès
Dan Jurafsky
344
25
0
27 Aug 2024
AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware
  Academic Reviews
AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews
Keith Tyser
Ben Segev
Gaston Longhitano
Xin-Yu Zhang
Zachary Meeks
...
Nicholas Belsten
A. Shporer
Madeleine Udell
Dov Te’eni
Iddo Drori
130
42
0
19 Aug 2024
Stratified Prediction-Powered Inference for Hybrid Language Model
  Evaluation
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
Adam Fisch
Joshua Maynez
R. A. Hofer
Bhuwan Dhingra
Amir Globerson
William W. Cohen
206
16
0
06 Jun 2024
A Note on the Prediction-Powered Bootstrap
A Note on the Prediction-Powered Bootstrap
Tijana Zrnic
281
6
0
28 May 2024
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM
  Outputs with Human Preferences
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Shreya Shankar
J.D. Zamfirescu-Pereira
Bjorn Hartmann
Aditya G. Parameswaran
Ian Arawjo
ALM
177
174
0
18 Apr 2024
Prediction-Powered Ranking of Large Language Models
Prediction-Powered Ranking of Large Language Models
Ivi Chatzi
Eleni Straitouri
Suhas Thejaswi
Manuel Gomez Rodriguez
ALM
351
13
0
27 Feb 2024
1