Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.12272
Cited By
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
18 April 2024
Shreya Shankar
J.D. Zamfirescu-Pereira
Bjorn Hartmann
Aditya G. Parameswaran
Ian Arawjo
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences"
15 / 15 papers shown
Title
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
Benchmarking Multi-National Value Alignment for Large Language Models
Chengyi Ju
Weijie Shi
Chengzhong Liu
Jiaming Ji
Jipeng Zhang
...
Jia Zhu
Jiajie Xu
Yaodong Yang
Sirui Han
Yike Guo
68
0
0
17 Apr 2025
A Scalable Framework for Evaluating Health Language Models
Neil Mallinar
A. Heydari
Xin Liu
Anthony Z. Faranesh
Brent Winslow
...
Mark Malhotra
Shwetak N. Patel
Javier L. Prieto
Daniel J. McDuff
Ahmed A. Metwally
LM&MA
56
2
0
30 Mar 2025
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
50
0
0
24 Mar 2025
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models
Shiran Dudy
Thulasi Tholeti
R. Ramachandranpillai
Muhammad Ali
Toby Jia-Jun Li
Ricardo Baeza-Yates
24
0
0
16 Mar 2025
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels
Luke M. Guerdan
Solon Barocas
Kenneth Holstein
Hanna M. Wallach
Zhiwei Steven Wu
Alexandra Chouldechova
ALM
ELM
150
0
0
13 Mar 2025
Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning
Michael Xieyang Liu
S. Petridis
Vivian Tsai
Alexander J. Fiannaca
Alex Olwal
Michael Terry
Carrie J. Cai
LRM
37
1
0
28 Jan 2025
Addressing Bias in Generative AI: Challenges and Research Opportunities in Information Management
Xiahua Wei
Naveen Kumar
Han Zhang
61
3
0
22 Jan 2025
The Interaction Layer: An Exploration for Co-Designing User-LLM Interactions in Parental Wellbeing Support Systems
Sruthi Viswanathan
Seray Ibrahim
Ravi Shankar
Reuben Binns
Max Van Kleek
Petr Slovák
57
1
0
02 Nov 2024
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
Shreya Shankar
Tristan Chambers
Eugene Wu
Aditya G. Parameswaran
Eugene Wu
LLMAG
53
6
0
16 Oct 2024
Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
Rushang Karia
Daniel Bramblett
D. Dobhal
Siddharth Srivastava
ELM
LRM
30
0
0
11 Oct 2024
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future
Haolin Jin
Linghan Huang
Haipeng Cai
Jun Yan
Bo Li
Huaming Chen
71
24
0
05 Aug 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
45
55
0
18 Jun 2024
Inverse Constitutional AI: Compressing Preferences into Principles
Arduin Findeis
Timo Kaufmann
Eyke Hüllermeier
Samuel Albanie
Robert Mullins
SyDa
41
9
0
02 Jun 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
53
29
0
02 Feb 2024
1