Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.00061
Cited By
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
30 June 2021
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
DeLMO
Re-assign community
ArXiv
PDF
HTML
Papers citing
"All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text"
50 / 220 papers shown
Title
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
Aparna Elangovan
Ling Liu
Lei Xu
S. Bodapati
Dan Roth
ELM
22
9
0
28 May 2024
Transformer and Hybrid Deep Learning Based Models for Machine-Generated Text Detection
Teodor-George Marchitan
Claudiu Creanga
Liviu P. Dinu
DeLMO
18
1
0
28 May 2024
Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges
Jonas Becker
Jan Philip Wahle
Bela Gipp
Terry Ruas
23
9
0
24 May 2024
Your Large Language Models Are Leaving Fingerprints
Hope McGovern
Rickard Stureborg
Yoshi Suhara
Dimitris Alikaniotis
DeLMO
41
11
0
22 May 2024
Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation
Cyril Chhun
Fabian M. Suchanek
Chloé Clavel
LRM
42
13
0
22 May 2024
Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs
Akhila Yerukola
Saujas Vaduguru
Daniel Fried
Maarten Sap
29
1
0
14 May 2024
RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
Liam Dugan
Alyssa Hwang
Filip Trhlik
Josh Magnus Ludan
Andrew Zhu
Hainiu Xu
Daphne Ippolito
Christopher Callison-Burch
DeLMO
AAML
33
40
0
13 May 2024
How Non-native English Speakers Use, Assess, and Select AI-Generated Paraphrases with Information Aids
Yewon Kim
Thanh-Long V. Le
Donghwi Kim
Mina Lee
Sung-Ju Lee
16
3
0
13 May 2024
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
44
7
0
09 May 2024
Explainability for Transparent Conversational Information-Seeking
Weronika Lajewska
Damiano Spina
Johanne Trippas
K. Balog
34
7
0
06 May 2024
Investigating Wit, Creativity, and Detectability of Large Language Models in Domain-Specific Writing Style Adaptation of Reddit's Showerthoughts
Tolga Buz
Benjamin Frost
Nikola Genchev
Moritz Schneider
Lucie-Aimée Kaffee
Gerard de Melo
DeLMO
33
9
0
02 May 2024
Towards Intent-based User Interfaces: Charting the Design Space of Intent-AI Interactions Across Task Types
Zijian Ding
44
6
0
28 Apr 2024
Text Quality-Based Pruning for Efficient Training of Language Models
Vasu Sharma
Karthik Padthe
Newsha Ardalani
Kushal Tirumala
Russell Howes
...
Po-Yao Huang
Shang-Wen Li
Armen Aghajanyan
Gargi Ghosh
Luke Zettlemoyer
44
5
0
26 Apr 2024
ReproHum #0087-01: Human Evaluation Reproduction Report for Generating Fact Checking Explanations
Tyler Loakman
Chenghua Lin
27
0
0
26 Apr 2024
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Olivia Wiles
Chuhan Zhang
Isabela Albuquerque
Ivana Kajić
Su Wang
...
Jordi Pont-Tuset
Aida Nematzadeh
Anant Nawalgaria
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
120
13
0
25 Apr 2024
Snake Story: Exploring Game Mechanics for Mixed-Initiative Co-creative Storytelling Games
Daijin Yang
Erica Kleinman
G. M. Troiano
Elina Tochilnikova
Casper Harteveld
21
2
0
11 Apr 2024
Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations
Mahjabin Nahar
Haeseung Seo
Eun-Ju Lee
Aiping Xiong
Dongwon Lee
HILM
29
11
0
04 Apr 2024
METAL: Towards Multilingual Meta-Evaluation
Rishav Hada
Varun Gumma
Mohamed Ahmed
Kalika Bali
Sunayana Sitaram
ELM
35
2
0
02 Apr 2024
MUGC: Machine Generated versus User Generated Content Detection
Yaqi Xie
Anjali Rawal
Yujing Cen
Dixuan Zhao
S. K. Narang
Shanu Sushmita
DeLMO
35
3
0
28 Mar 2024
EAGLE: A Domain Generalization Framework for AI-generated Text Detection
Amrita Bhattacharjee
Raha Moraffah
Joshua Garland
Huan Liu
DeLMO
29
5
0
23 Mar 2024
RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain
William James Bolton
Rafael Poyiadzi
Edward R. Morrell
Gabriela van Bergen Gonzalez Bueno
Lea Goetz
35
2
0
21 Mar 2024
A Design Space for Intelligent and Interactive Writing Assistants
Mina Lee
Katy Ilonka Gero
John Joon Young Chung
S. Buckingham Shum
Vipul Raheja
...
Joonsuk Park
Roy Pea
Eugenia H. Rho
Shannon Zejiang Shen
Pao Siangliulue
29
82
0
21 Mar 2024
Train & Constrain: Phonologically Informed Tongue-Twister Generation from Topics and Paraphrases
Tyler Loakman
Chen Tang
Chenghua Lin
38
4
0
20 Mar 2024
Emergence of Social Norms in Generative Agent Societies: Principles and Architecture
Siyue Ren
Zhiyao Cui
Ruiqi Song
Zhen Wang
Shuyue Hu
LLMAG
27
8
0
13 Mar 2024
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
Weixin Liang
Zachary Izzo
Yaohui Zhang
Haley Lepp
Hancheng Cao
...
Haotian Ye
Sheng Liu
Zhi Huang
Daniel A. McFarland
James Y. Zou
DeLMO
71
79
0
11 Mar 2024
A Survey on Human-AI Teaming with Large Pre-Trained Models
Vanshika Vats
Marzia Binta Nizam
Minghao Liu
Ziyuan Wang
Richard Ho
...
Celeste Shen
Rachel Shen
Nafisa Hussain
Kesav Ravichandran
James Davis
LM&MA
36
8
0
07 Mar 2024
Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights
Zijie Zeng
Shiqi Liu
Lele Sha
Zhuang Li
Kaixun Yang
Sannyuya Liu
Dragan Gavsević
Guanliang Chen
DeLMO
42
0
0
06 Mar 2024
TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs
Tanmay Rajore
Nishanth Chandran
Sunayana Sitaram
Divya Gupta
Rahul Sharma
Kashish Mittal
Manohar Swaminathan
41
14
0
01 Mar 2024
Counterspeakers' Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate
Jimin Mun
Cathy Buerger
Jenny T Liang
Joshua Garland
Maarten Sap
26
10
0
29 Feb 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
56
17
0
28 Feb 2024
Humans or LLMs as the Judge? A Study on Judgement Biases
Guiming Hardy Chen
Shunian Chen
Ziche Liu
Feng Jiang
Benyou Wang
77
92
0
16 Feb 2024
Can AI and humans genuinely communicate?
Constant Bonard
36
1
0
14 Feb 2024
Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims at Scale
Freddy Heppell
M. Bakir
Kalina Bontcheva
DeLMO
27
1
0
13 Feb 2024
Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?
Marcio Fonseca
Shay B. Cohen
39
10
0
18 Jan 2024
Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty
Kaitlyn Zhou
Jena D. Hwang
Xiang Ren
Maarten Sap
28
54
0
12 Jan 2024
Advancing GUI for Generative AI: Charting the Design Space of Human-AI Interactions through Task Creativity and Complexity
Zijian Ding
LLMAG
28
4
0
04 Jan 2024
Large Language Models for Conducting Advanced Text Analytics Information Systems Research
Benjamin Ampel
Chi-Heng Yang
J. Hu
Hsinchun Chen
21
7
0
27 Dec 2023
New Evaluation Metrics Capture Quality Degradation due to LLM Watermarking
Karanpartap Singh
James Zou
WaLM
105
9
0
04 Dec 2023
I Know You Did Not Write That! A Sampling Based Watermarking Method for Identifying Machine Generated Text
Kaan Efe Keles
Ömer Kaan Gürbüz
Mucahid Kutlu
WaLM
16
1
0
29 Nov 2023
Reducing Privacy Risks in Online Self-Disclosures with Language Models
Yao Dou
Isadora Krsek
Tarek Naous
Anubha Kabra
Sauvik Das
Alan Ritter
Wei-ping Xu
28
21
0
16 Nov 2023
AI-generated text boundary detection with RoFT
Laida Kushnareva
T. Gaintseva
German Magai
S. Barannikov
Dmitry Abulkhanov
Kristian Kuznetsov
Eduard Tulchinskii
Irina Piontkovskaya
Sergey I. Nikolenko
DeLMO
15
4
0
14 Nov 2023
Evaluation of GPT-4 for chest X-ray impression generation: A reader study on performance and perception
Sebastian Ziegelmayer
Alexander W. Marka
Nicolas Lenhart
Nadja Nehls
S. Reischl
Felix Harder
Andreas Sauter
Marcus R. Makowski
Markus Graf
J. Gawlitza
MedIm
LM&MA
17
12
0
12 Nov 2023
The Iron(ic) Melting Pot: Reviewing Human Evaluation in Humour, Irony and Sarcasm Generation
Tyler Loakman
Aaron Maladry
Chenghua Lin
18
7
0
09 Nov 2023
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
Naomi Saphra
Eve Fleisig
Kyunghyun Cho
Adam Lopez
LRM
17
8
0
08 Nov 2023
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Hanlin Zhang
Benjamin L. Edelman
Danilo Francati
Daniele Venturi
G. Ateniese
Boaz Barak
WaLM
138
54
0
07 Nov 2023
How well can machine-generated texts be identified and can language models be trained to avoid identification?
Sinclair Schneider
Florian Steuber
João A. G. Schneider
Gabi Dreo Rodosek
DeLMO
15
1
0
25 Oct 2023
HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis
Nafis Irtiza Tripto
Adaku Uchendu
Thai V. Le
Mattia Setzu
F. Giannotti
Dongwon Lee
DeLMO
23
6
0
25 Oct 2023
Do Stochastic Parrots have Feelings Too? Improving Neural Detection of Synthetic Text via Emotion Recognition
Alan Cowap
Yvette Graham
Jennifer Foster
DeLMO
30
0
0
24 Oct 2023
Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model
Leonie Weissweiler
Valentin Hofmann
Anjali Kantharuban
Anna Cai
Ritam Dutt
...
Abhishek Vijayakumar
Haofei Yu
Hinrich Schütze
Kemal Oflazer
David R. Mortensen
23
10
0
23 Oct 2023
A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions
Junchao Wu
Shu Yang
Runzhe Zhan
Yulin Yuan
Derek F. Wong
Lidia S. Chao
DeLMO
24
23
0
23 Oct 2023
Previous
1
2
3
4
5
Next