The Human Evaluation Datasheet 1.0: A Template for Recording Details of Human Evaluation Experiments in NLP

17 March 2021

Papers citing "The Human Evaluation Datasheet 1.0: A Template for Recording Details of Human Evaluation Experiments in NLP"

7 / 7 papers shown

Title
Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP Anya Belz Craig Thomson Ehud Reiter Gavin Abercrombie J. Alonso-Moral ... Antonio Toral Xiao-Yi Wan Leo Wanner Lewis J. Watson Diyi Yang 66 35 0 02 May 2023
Evaluating NLG systems: A brief introduction Emiel van Miltenburg 18 0 0 29 Mar 2023
MAUVE Scores for Generative Models: Theory and Practice Krishna Pillutla Lang Liu John Thickstun Sean Welleck Swabha Swayamdipta Rowan Zellers Sewoong Oh Yejin Choi Zaïd Harchaoui EGVM 23 21 0 30 Dec 2022
Quantified Reproducibility Assessment of NLP Results Anya Belz Maja Popović Simon Mille 17 28 0 12 Apr 2022
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics Sebastian Gehrmann Tosin P. Adewumi Karmanya Aggarwal Pawan Sasanka Ammanamanchi Aremu Anuoluwapo ... Nishant Subramani Wei-ping Xu Diyi Yang Akhila Yerukola Jiawei Zhou VLM 246 283 0 02 Feb 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers Krishna Pillutla Swabha Swayamdipta Rowan Zellers John Thickstun Sean Welleck Yejin Choi Zaïd Harchaoui 26 341 0 02 Feb 2021
With Little Power Comes Great Responsibility Dallas Card Peter Henderson Urvashi Khandelwal Robin Jia Kyle Mahowald Dan Jurafsky 225 115 0 13 Oct 2020