Title
SPHERE: An Evaluation Card for Human-AI Systems Qianou Ma Dora Zhao Xinran Zhao Chenglei Si Chenyang Yang Ryan Louie Ehud Reiter Diyi Yang Tongshuang Wu ALM 46 0 0 24 Mar 2025
The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection Tomas Horych Christoph Mandl Terry Ruas André Greiner-Petter Bela Gipp Akiko Aizawa Timo Spinde 83 3 0 17 Nov 2024
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation Dongryeol Lee Yerin Hwang Yongil Kim Joonsuk Park Kyomin Jung ELM 57 4 0 28 Oct 2024
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs Ran Zhang Wei-Ye Zhao Steffen Eger 63 4 0 24 Oct 2024