Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge

28 January 2025

Papers citing "Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge"

3 / 3 papers shown

Title
Agree to Disagree? A Meta-Evaluation of LLM Misgendering Arjun Subramonian Vagrant Gautam Preethi Seshadri Dietrich Klakow Kai-Wei Chang Yizhou Sun 27 1 0 23 Apr 2025
You Cannot Feed Two Birds with One Score: the Accuracy-Naturalness Tradeoff in Translation Gergely Flamich David Vilar Jan-Thorsten Peter Markus Freitag 27 0 0 31 Mar 2025
TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes Raj Sanjay Shah Lei Xu Qianchu Liu Jon Burnsky Drew Bertagnolli Chaitanya P. Shivade LM&MA 86 0 0 26 Mar 2025