ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.00061
  4. Cited By
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated
  Text

All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

30 June 2021
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
    DeLMO
ArXivPDFHTML

Papers citing "All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text"

20 / 220 papers shown
Title
Synthetic Disinformation Attacks on Automated Fact Verification Systems
Synthetic Disinformation Attacks on Automated Fact Verification Systems
Y. Du
Antoine Bosselut
Christopher D. Manning
AAML
OffRL
8
31
0
18 Feb 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation
  Practices for Generated Text
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELM
AI4CE
58
181
0
14 Feb 2022
A Benchmark Corpus for the Detection of Automatically Generated Text in
  Academic Publications
A Benchmark Corpus for the Detection of Automatically Generated Text in Academic Publications
Vijini Liyanage
Davide Buscaldi
A. Nazarenko
DeLMO
15
24
0
04 Feb 2022
Towards Coherent and Consistent Use of Entities in Narrative Generation
Towards Coherent and Consistent Use of Entities in Narrative Generation
Pinelopi Papalampidi
Kris Cao
Tomás Kociský
HILM
14
12
0
03 Feb 2022
WANLI: Worker and AI Collaboration for Natural Language Inference
  Dataset Creation
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
Alisa Liu
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
30
212
0
16 Jan 2022
Imagined versus Remembered Stories: Quantifying Differences in Narrative
  Flow
Imagined versus Remembered Stories: Quantifying Differences in Narrative Flow
Maarten Sap
A. Jafarpour
Yejin Choi
Noah A. Smith
J. Pennebaker
Eric Horvitz
11
1
0
07 Jan 2022
Dynamic Human Evaluation for Relative Model Comparisons
Dynamic Human Evaluation for Relative Model Comparisons
Thórhildur Thorleiksdóttir
Cédric Renggli
Nora Hollenstein
Ce Zhang
22
2
0
15 Dec 2021
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Lavinia Dunagan
Jacob Morrison
Alexander R. Fabbri
Yejin Choi
Noah A. Smith
49
39
0
08 Dec 2021
Modelling Direct Messaging Networks with Multiple Recipients for Cyber
  Deception
Modelling Direct Messaging Networks with Multiple Recipients for Cyber Deception
Kristen Moore
Cody James Christopher
David Liebowitz
Surya Nepal
R. Selvey
19
4
0
21 Nov 2021
Transparent Human Evaluation for Image Captioning
Transparent Human Evaluation for Image Captioning
Jungo Kasai
Keisuke Sakaguchi
Lavinia Dunagan
Jacob Morrison
Ronan Le Bras
Yejin Choi
Noah A. Smith
17
47
0
17 Nov 2021
Unsupervised and Distributional Detection of Machine-Generated Text
Unsupervised and Distributional Detection of Machine-Generated Text
Matthias Gallé
Jos Rozen
Germán Kruszewski
Hady ElSahar
DeLMO
17
25
0
04 Nov 2021
A Systematic Investigation of Commonsense Knowledge in Large Language
  Models
A Systematic Investigation of Commonsense Knowledge in Large Language Models
Xiang Lorraine Li
A. Kuncoro
Jordan Hoffmann
Cyprien de Masson dÁutume
Phil Blunsom
Aida Nematzadeh
LRM
22
57
0
31 Oct 2021
Attacking Open-domain Question Answering by Injecting Misinformation
Attacking Open-domain Question Answering by Injecting Misinformation
Liangming Pan
Wenhu Chen
Min-Yen Kan
W. Wang
HILM
AAML
193
22
0
15 Oct 2021
Leveraging Generative Models for Covert Messaging: Challenges and
  Tradeoffs for "Dead-Drop" Deployments
Leveraging Generative Models for Covert Messaging: Challenges and Tradeoffs for "Dead-Drop" Deployments
L. A. Bauer
IV JamesK.Howes
Sam A. Markelon
Vincent Bindschaedler
Thomas Shrimpton
11
2
0
13 Oct 2021
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text
  Generation
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
Marzena Karpinska
Nader Akoury
Mohit Iyyer
204
106
0
14 Sep 2021
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework
  for Scrutinizing Machine Text
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text
Yao Dou
Maxwell Forbes
Rik Koncel-Kedziorski
Noah A. Smith
Yejin Choi
DeLMO
6
126
0
02 Jul 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and
  Metrics
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
246
283
0
02 Feb 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using
  Divergence Frontiers
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
Krishna Pillutla
Swabha Swayamdipta
Rowan Zellers
John Thickstun
Sean Welleck
Yejin Choi
Zaïd Harchaoui
26
341
0
02 Feb 2021
GENIE: Toward Reproducible and Standardized Human Evaluation for Text
  Generation
GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation
Daniel Khashabi
Gabriel Stanovsky
Jonathan Bragg
Nicholas Lourie
Jungo Kasai
Yejin Choi
Noah A. Smith
Daniel S. Weld
13
20
0
17 Jan 2021
With Little Power Comes Great Responsibility
With Little Power Comes Great Responsibility
Dallas Card
Peter Henderson
Urvashi Khandelwal
Robin Jia
Kyle Mahowald
Dan Jurafsky
225
115
0
13 Oct 2020
Previous
12345