ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07008
  4. Cited By
AutoEval Done Right: Using Synthetic Data for Model Evaluation

AutoEval Done Right: Using Synthetic Data for Model Evaluation

9 March 2024
Pierre Boyeau
Anastasios Nikolas Angelopoulos
N. Yosef
Jitendra Malik
Michael I. Jordan
    SyDa
ArXivPDFHTML

Papers citing "AutoEval Done Right: Using Synthetic Data for Model Evaluation"

13 / 13 papers shown
Title
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
G. Wang
Z. Chen
Bo Li
Haifeng Xu
85
0
0
02 May 2025
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels
Luke M. Guerdan
Solon Barocas
Kenneth Holstein
Hanna M. Wallach
Zhiwei Steven Wu
Alexandra Chouldechova
ALM
ELM
171
0
0
13 Mar 2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Zhaoyi Zhou
Yuda Song
Andrea Zanette
ALM
68
0
0
14 Feb 2025
Evaluation of Large Language Models via Coupled Token Generation
Evaluation of Large Language Models via Coupled Token Generation
N. C. Benz
Stratis Tsirtsis
Eleni Straitouri
Ivi Chatzi
Ander Artola Velasco
Suhas Thejaswi
Manuel Gomez Rodriguez
46
0
0
03 Feb 2025
Auto-Evaluation with Few Labels through Post-hoc Regression
Auto-Evaluation with Few Labels through Post-hoc Regression
Benjamin Eyre
David Madras
69
1
0
19 Nov 2024
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Florian E. Dorner
Vivian Y. Nastl
Moritz Hardt
ELM
ALM
35
5
0
17 Oct 2024
ChainBuddy: An AI Agent System for Generating LLM Pipelines
ChainBuddy: An AI Agent System for Generating LLM Pipelines
Jingyue Zhang
Ian Arawjo
LLMAG
22
3
0
20 Sep 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligorić
Tijana Zrnic
Cinoo Lee
Emmanuel J. Candès
Dan Jurafsky
66
5
0
27 Aug 2024
AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware
  Academic Reviews
AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews
Keith Tyser
Ben Segev
Gaston Longhitano
Xin-Yu Zhang
Zachary Meeks
...
Nicholas Belsten
A. Shporer
Madeleine Udell
Dov Te’eni
Iddo Drori
27
13
0
19 Aug 2024
Stratified Prediction-Powered Inference for Hybrid Language Model
  Evaluation
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
Adam Fisch
Joshua Maynez
R. A. Hofer
Bhuwan Dhingra
Amir Globerson
William W. Cohen
36
7
0
06 Jun 2024
A Note on the Prediction-Powered Bootstrap
A Note on the Prediction-Powered Bootstrap
Tijana Zrnic
22
3
0
28 May 2024
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM
  Outputs with Human Preferences
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Shreya Shankar
J.D. Zamfirescu-Pereira
Bjorn Hartmann
Aditya G. Parameswaran
Ian Arawjo
ALM
32
84
0
18 Apr 2024
Prediction-Powered Ranking of Large Language Models
Prediction-Powered Ranking of Large Language Models
Ivi Chatzi
Eleni Straitouri
Suhas Thejaswi
Manuel Gomez Rodriguez
ALM
29
5
0
27 Feb 2024
1