Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM
Outputs with Human Preferences

Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences

18 April 2024

J.D. Zamfirescu-Pereira

Aditya G. Parameswaran

Papers citing "Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences"

15 / 15 papers shown

Title
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks Yixin Cao Shibo Hong X. Li Jiahao Ying Yubo Ma ... Juanzi Li Aixin Sun Xuanjing Huang Tat-Seng Chua Yu Jiang ALM ELM 84 0 0 26 Apr 2025
Benchmarking Multi-National Value Alignment for Large Language Models Chengyi Ju Weijie Shi Chengzhong Liu Jiaming Ji Jipeng Zhang ... Jia Zhu Jiajie Xu Yaodong Yang Sirui Han Yike Guo 68 0 0 17 Apr 2025
A Scalable Framework for Evaluating Health Language Models Neil Mallinar A. Heydari Xin Liu Anthony Z. Faranesh Brent Winslow ... Mark Malhotra Shwetak N. Patel Javier L. Prieto Daniel J. McDuff Ahmed A. Metwally LM&MA 56 2 0 30 Mar 2025
SPHERE: An Evaluation Card for Human-AI Systems Qianou Ma Dora Zhao Xinran Zhao Chenglei Si Chenyang Yang Ryan Louie Ehud Reiter Diyi Yang Tongshuang Wu ALM 50 0 0 24 Mar 2025
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models Shiran Dudy Thulasi Tholeti R. Ramachandranpillai Muhammad Ali Toby Jia-Jun Li Ricardo Baeza-Yates 24 0 0 16 Mar 2025
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels Luke M. Guerdan Solon Barocas Kenneth Holstein Hanna M. Wallach Zhiwei Steven Wu Alexandra Chouldechova ALM ELM 150 0 0 13 Mar 2025
Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning Michael Xieyang Liu S. Petridis Vivian Tsai Alexander J. Fiannaca Alex Olwal Michael Terry Carrie J. Cai LRM 37 1 0 28 Jan 2025
Addressing Bias in Generative AI: Challenges and Research Opportunities in Information Management Xiahua Wei Naveen Kumar Han Zhang 61 3 0 22 Jan 2025
The Interaction Layer: An Exploration for Co-Designing User-LLM Interactions in Parental Wellbeing Support Systems Sruthi Viswanathan Seray Ibrahim Ravi Shankar Reuben Binns Max Van Kleek Petr Slovák 57 1 0 02 Nov 2024
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing Shreya Shankar Tristan Chambers Eugene Wu Aditya G. Parameswaran Eugene Wu LLMAG 53 6 0 16 Oct 2024
Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks Rushang Karia Daniel Bramblett D. Dobhal Siddharth Srivastava ELM LRM 30 0 0 11 Oct 2024
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future Haolin Jin Linghan Huang Haipeng Cai Jun Yan Bo Li Huaming Chen 71 24 0 05 Aug 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Aman Singh Thakur Kartik Choudhary Venkat Srinik Ramayapally Sankaran Vaidyanathan Dieuwke Hupkes ELM ALM 45 55 0 18 Jun 2024
Inverse Constitutional AI: Compressing Preferences into Principles Arduin Findeis Timo Kaufmann Eyke Hüllermeier Samuel Albanie Robert Mullins SyDa 41 9 0 02 Jun 2024
LLM-based NLG Evaluation: Current Status and Challenges Mingqi Gao Xinyu Hu Jie Ruan Xiao Pu Xiaojun Wan ELM LM&MA 53 29 0 02 Feb 2024