Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in
Expert Knowledge Tasks

Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks

26 October 2024

Annalisa Szymanski

Heather A. Eicher-Miller

Meng-Long Jiang

Ronald A Metoyer

Papers citing "Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks"

10 / 10 papers shown

Title
FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation Chanyeol Choi Jihoon Kwon Jaeseon Ha Hojun Choi Chaewoon Kim Yongjae Lee Jy-yong Sohn Alejandro Lopez-Lira RALM 54 0 0 22 Apr 2025
Arti-"fickle" Intelligence: Using LLMs as a Tool for Inference in the Political and Social Sciences Lisa P. Argyle Ethan C. Busby Joshua R Gubler Bryce Hepner Alex Lyman David Wingate LLMAG 43 0 0 04 Apr 2025
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions Yubo Li Yidi Miao Xueying Ding Ramayya Krishnan R. Padman 37 0 0 28 Mar 2025
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models Shiran Dudy Thulasi Tholeti R. Ramachandranpillai Muhammad Ali Toby Jia-Jun Li Ricardo Baeza-Yates 21 0 0 16 Mar 2025
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels Luke M. Guerdan Solon Barocas Kenneth Holstein Hanna M. Wallach Zhiwei Steven Wu Alexandra Chouldechova ALM ELM 114 0 0 13 Mar 2025
PinLanding: Content-First Keyword Landing Page Generation via Multi-Modal AI for Web-Scale Discovery Faye Zhang Jasmine Wan Qianyu Cheng Jinfeng Rao 31 0 0 01 Mar 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks Rylan Schaeffer Punit Singh Koura Binh Tang R. Subramanian Aaditya K. Singh ... Vedanuj Goswami Sergey Edunov Dieuwke Hupkes Sanmi Koyejo Sharan Narang ALM 69 0 0 24 Feb 2025
Integrating Expert Knowledge into Logical Programs via LLMs Franciszek Górski Oskar Wysocki Marco Valentino André Freitas 27 0 0 17 Feb 2025
Supporting Co-Adaptive Machine Teaching through Human Concept Learning and Cognitive Theories Simret Araya Gebreegziabher Yukun Yang Elena L. Glassman T. Li 21 5 0 25 Sep 2024
Security and Privacy Challenges of Large Language Models: A Survey B. Das M. H. Amini Yanzhao Wu PILM ELM 17 98 0 30 Jan 2024