All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

30 June 2021

Papers citing "All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text"

50 / 220 papers shown

Title
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models Aparna Elangovan Ling Liu Lei Xu S. Bodapati Dan Roth ELM 22 9 0 28 May 2024
Transformer and Hybrid Deep Learning Based Models for Machine-Generated Text Detection Teodor-George Marchitan Claudiu Creanga Liviu P. Dinu DeLMO 18 1 0 28 May 2024
Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges Jonas Becker Jan Philip Wahle Bela Gipp Terry Ruas 23 9 0 24 May 2024
Your Large Language Models Are Leaving Fingerprints Hope McGovern Rickard Stureborg Yoshi Suhara Dimitris Alikaniotis DeLMO 41 11 0 22 May 2024
Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation Cyril Chhun Fabian M. Suchanek Chloé Clavel LRM 42 13 0 22 May 2024
Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs Akhila Yerukola Saujas Vaduguru Daniel Fried Maarten Sap 29 1 0 14 May 2024
RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors Liam Dugan Alyssa Hwang Filip Trhlik Josh Magnus Ludan Andrew Zhu Hainiu Xu Daphne Ippolito Christopher Callison-Burch DeLMO AAML 33 40 0 13 May 2024
How Non-native English Speakers Use, Assess, and Select AI-Generated Paraphrases with Information Aids Yewon Kim Thanh-Long V. Le Donghwi Kim Mina Lee Sung-Ju Lee 16 3 0 13 May 2024
Natural Language Processing RELIES on Linguistics Juri Opitz Shira Wein Nathan Schneider AI4CE 44 7 0 09 May 2024
Explainability for Transparent Conversational Information-Seeking Weronika Lajewska Damiano Spina Johanne Trippas K. Balog 34 7 0 06 May 2024
Investigating Wit, Creativity, and Detectability of Large Language Models in Domain-Specific Writing Style Adaptation of Reddit's Showerthoughts Tolga Buz Benjamin Frost Nikola Genchev Moritz Schneider Lucie-Aimée Kaffee Gerard de Melo DeLMO 33 9 0 02 May 2024
Towards Intent-based User Interfaces: Charting the Design Space of Intent-AI Interactions Across Task Types Zijian Ding 44 6 0 28 Apr 2024
Text Quality-Based Pruning for Efficient Training of Language Models Vasu Sharma Karthik Padthe Newsha Ardalani Kushal Tirumala Russell Howes ... Po-Yao Huang Shang-Wen Li Armen Aghajanyan Gargi Ghosh Luke Zettlemoyer 44 5 0 26 Apr 2024
ReproHum #0087-01: Human Evaluation Reproduction Report for Generating Fact Checking Explanations Tyler Loakman Chenghua Lin 27 0 0 26 Apr 2024
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings Olivia Wiles Chuhan Zhang Isabela Albuquerque Ivana Kajić Su Wang ... Jordi Pont-Tuset Aida Nematzadeh Anant Nawalgaria Jordi Pont-Tuset Aida Nematzadeh EGVM 120 13 0 25 Apr 2024
Snake Story: Exploring Game Mechanics for Mixed-Initiative Co-creative Storytelling Games Daijin Yang Erica Kleinman G. M. Troiano Elina Tochilnikova Casper Harteveld 21 2 0 11 Apr 2024
Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations Mahjabin Nahar Haeseung Seo Eun-Ju Lee Aiping Xiong Dongwon Lee HILM 29 11 0 04 Apr 2024
METAL: Towards Multilingual Meta-Evaluation Rishav Hada Varun Gumma Mohamed Ahmed Kalika Bali Sunayana Sitaram ELM 35 2 0 02 Apr 2024
MUGC: Machine Generated versus User Generated Content Detection Yaqi Xie Anjali Rawal Yujing Cen Dixuan Zhao S. K. Narang Shanu Sushmita DeLMO 35 3 0 28 Mar 2024
EAGLE: A Domain Generalization Framework for AI-generated Text Detection Amrita Bhattacharjee Raha Moraffah Joshua Garland Huan Liu DeLMO 29 5 0 23 Mar 2024
RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain William James Bolton Rafael Poyiadzi Edward R. Morrell Gabriela van Bergen Gonzalez Bueno Lea Goetz 35 2 0 21 Mar 2024
A Design Space for Intelligent and Interactive Writing Assistants Mina Lee Katy Ilonka Gero John Joon Young Chung S. Buckingham Shum Vipul Raheja ... Joonsuk Park Roy Pea Eugenia H. Rho Shannon Zejiang Shen Pao Siangliulue 29 82 0 21 Mar 2024
Train & Constrain: Phonologically Informed Tongue-Twister Generation from Topics and Paraphrases Tyler Loakman Chen Tang Chenghua Lin 38 4 0 20 Mar 2024
Emergence of Social Norms in Generative Agent Societies: Principles and Architecture Siyue Ren Zhiyao Cui Ruiqi Song Zhen Wang Shuyue Hu LLMAG 27 8 0 13 Mar 2024
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews Weixin Liang Zachary Izzo Yaohui Zhang Haley Lepp Hancheng Cao ... Haotian Ye Sheng Liu Zhi Huang Daniel A. McFarland James Y. Zou DeLMO 71 79 0 11 Mar 2024
A Survey on Human-AI Teaming with Large Pre-Trained Models Vanshika Vats Marzia Binta Nizam Minghao Liu Ziyuan Wang Richard Ho ... Celeste Shen Rachel Shen Nafisa Hussain Kesav Ravichandran James Davis LM&MA 36 8 0 07 Mar 2024
Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights Zijie Zeng Shiqi Liu Lele Sha Zhuang Li Kaixun Yang Sannyuya Liu Dragan Gavsević Guanliang Chen DeLMO 42 0 0 06 Mar 2024
TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs Tanmay Rajore Nishanth Chandran Sunayana Sitaram Divya Gupta Rahul Sharma Kashish Mittal Manohar Swaminathan 41 14 0 01 Mar 2024
Counterspeakers' Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate Jimin Mun Cathy Buerger Jenny T Liang Joshua Garland Maarten Sap 26 10 0 29 Feb 2024
On the Challenges and Opportunities in Generative AI Laura Manduchi Kushagra Pandey Robert Bamler Ryan Cotterell Sina Daubener ... F. Wenzel Frank Wood Stephan Mandt Vincent Fortuin Vincent Fortuin 56 17 0 28 Feb 2024
Humans or LLMs as the Judge? A Study on Judgement Biases Guiming Hardy Chen Shunian Chen Ziche Liu Feng Jiang Benyou Wang 77 92 0 16 Feb 2024
Can AI and humans genuinely communicate? Constant Bonard 36 1 0 14 Feb 2024
Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims at Scale Freddy Heppell M. Bakir Kalina Bontcheva DeLMO 27 1 0 13 Feb 2024
Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals? Marcio Fonseca Shay B. Cohen 39 10 0 18 Jan 2024
Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty Kaitlyn Zhou Jena D. Hwang Xiang Ren Maarten Sap 28 54 0 12 Jan 2024
Advancing GUI for Generative AI: Charting the Design Space of Human-AI Interactions through Task Creativity and Complexity Zijian Ding LLMAG 28 4 0 04 Jan 2024
Large Language Models for Conducting Advanced Text Analytics Information Systems Research Benjamin Ampel Chi-Heng Yang J. Hu Hsinchun Chen 21 7 0 27 Dec 2023
New Evaluation Metrics Capture Quality Degradation due to LLM Watermarking Karanpartap Singh James Zou WaLM 105 9 0 04 Dec 2023
I Know You Did Not Write That! A Sampling Based Watermarking Method for Identifying Machine Generated Text Kaan Efe Keles Ömer Kaan Gürbüz Mucahid Kutlu WaLM 16 1 0 29 Nov 2023
Reducing Privacy Risks in Online Self-Disclosures with Language Models Yao Dou Isadora Krsek Tarek Naous Anubha Kabra Sauvik Das Alan Ritter Wei-ping Xu 28 21 0 16 Nov 2023
AI-generated text boundary detection with RoFT Laida Kushnareva T. Gaintseva German Magai S. Barannikov Dmitry Abulkhanov Kristian Kuznetsov Eduard Tulchinskii Irina Piontkovskaya Sergey I. Nikolenko DeLMO 15 4 0 14 Nov 2023
Evaluation of GPT-4 for chest X-ray impression generation: A reader study on performance and perception Sebastian Ziegelmayer Alexander W. Marka Nicolas Lenhart Nadja Nehls S. Reischl Felix Harder Andreas Sauter Marcus R. Makowski Markus Graf J. Gawlitza MedIm LM&MA 17 12 0 12 Nov 2023
The Iron(ic) Melting Pot: Reviewing Human Evaluation in Humour, Irony and Sarcasm Generation Tyler Loakman Aaron Maladry Chenghua Lin 18 7 0 09 Nov 2023
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models Naomi Saphra Eve Fleisig Kyunghyun Cho Adam Lopez LRM 17 8 0 08 Nov 2023
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models Hanlin Zhang Benjamin L. Edelman Danilo Francati Daniele Venturi G. Ateniese Boaz Barak WaLM 138 54 0 07 Nov 2023
How well can machine-generated texts be identified and can language models be trained to avoid identification? Sinclair Schneider Florian Steuber João A. G. Schneider Gabi Dreo Rodosek DeLMO 15 1 0 25 Oct 2023
HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis Nafis Irtiza Tripto Adaku Uchendu Thai V. Le Mattia Setzu F. Giannotti Dongwon Lee DeLMO 23 6 0 25 Oct 2023
Do Stochastic Parrots have Feelings Too? Improving Neural Detection of Synthetic Text via Emotion Recognition Alan Cowap Yvette Graham Jennifer Foster DeLMO 30 0 0 24 Oct 2023
Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model Leonie Weissweiler Valentin Hofmann Anjali Kantharuban Anna Cai Ritam Dutt ... Abhishek Vijayakumar Haofei Yu Hinrich Schütze Kemal Oflazer David R. Mortensen 23 10 0 23 Oct 2023
A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions Junchao Wu Shu Yang Runzhe Zhan Yulin Yuan Derek F. Wong Lidia S. Chao DeLMO 24 23 0 23 Oct 2023