Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.01633
Cited By
Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
2 May 2023
Anya Belz
Craig Thomson
Ehud Reiter
Gavin Abercrombie
J. Alonso-Moral
Mohammad Arvan
Jackie C.K. Cheung
Mark Cieliebak
Elizabeth Clark
Kees van Deemter
Tanvi Dinkar
Ondrej Dusek
Steffen Eger
Qixiang Fang
Mingqi Gao
Albert Gatt
Dimitra Gkatzia
Javier González-Corbelle
Dirk Hovy
Manuela Hurlimann
Takumi Ito
John D. Kelleher
Filip Klubicka
Emiel Krahmer
Huiyuan Lai
Chris van der Lee
Yiru Li
Saad Mahamood
Margot Mieskes
Emiel van Miltenburg
Pablo Romero
Malvina Nissim
Natalie Parde
Ondvrej Plátek
Verena Rieser
Jie Ruan
Joel R. Tetreault
Antonio Toral
Xiao-Yi Wan
Leo Wanner
Lewis J. Watson
Diyi Yang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP"
4 / 4 papers shown
Title
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
46
0
0
24 Mar 2025
The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection
Tomas Horych
Christoph Mandl
Terry Ruas
André Greiner-Petter
Bela Gipp
Akiko Aizawa
Timo Spinde
83
3
0
17 Nov 2024
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Dongryeol Lee
Yerin Hwang
Yongil Kim
Joonsuk Park
Kyomin Jung
ELM
57
4
0
28 Oct 2024
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
Ran Zhang
Wei-Ye Zhao
Steffen Eger
63
4
0
24 Oct 2024
1