The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

4 November 2022

Papers citing "The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation"

50 / 59 papers shown

Title
MetaHarm: Harmful YouTube Video Dataset Annotated by Domain Experts, GPT-4-Turbo, and Crowdworkers Wonjeong Jo Magdalena Wojcieszak 19 0 0 22 Apr 2025
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels Luke M. Guerdan Solon Barocas Kenneth Holstein Hanna M. Wallach Zhiwei Steven Wu Alexandra Chouldechova ALM ELM 120 0 0 13 Mar 2025
Embracing Diversity: A Multi-Perspective Approach with Soft Labels Benedetta Muscato Praveen Bushipaka Gizem Gezici Lucia Passaro F. Giannotti Tommaso Cucinotta 34 0 0 01 Mar 2025
AI Alignment at Your Discretion Maarten Buyl Hadi Khalaf C. M. Verdun Lucas Monteiro Paes Caio Vieira Machado Flavio du Pin Calmon 33 0 0 10 Feb 2025
Exploring the Influence of Label Aggregation on Minority Voices: Implications for Dataset Bias and Model Training Mugdha Pandya Nafise Sadat Moosavi Diana Maynard 67 0 0 05 Dec 2024
Towards Fair Pay and Equal Work: Imposing View Time Limits in Crowdsourced Image Classification Gordon Lim Stefan Larson Yu Huang Kevin Leach 73 0 0 29 Nov 2024
Conformalized Credal Regions for Classification with Ambiguous Ground Truth Michele Caprio David Stutz Shuo Li Arnaud Doucet UQCV 57 4 0 07 Nov 2024
Harmful YouTube Video Detection: A Taxonomy of Online Harm and MLLMs as Alternative Annotators Claire Wonjeong Jo Miki Wesołowska Magdalena Wojcieszak 18 4 0 06 Nov 2024
Reducing annotator bias by belief elicitation Terne Sasha Thorn Jakobsen Andreas Bjerre-Nielsen Robert Böhm 34 0 0 21 Oct 2024
Label Convergence: Defining an Upper Performance Bound in Object Recognition through Contradictory Annotations David Tschirschwitz Volker Rodehorst 16 1 0 14 Sep 2024
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks? Urja Khurana Eric T. Nalisnick Antske Fokkens Swabha Swayamdipta 29 3 0 26 Aug 2024
Accelerating Domain-Aware Electron Microscopy Analysis Using Deep Learning Models with Synthetic Data and Image-Wide Confidence Scoring Matthew J. Lynch Ryan Jacobs Gabriella Bruno Priyam V. Patki Dane Morgan Kevin G. Field 21 0 0 02 Aug 2024
Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? Peter Hase Thomas Hofweber Xiang Zhou Elias Stengel-Eskin Mohit Bansal KELM LRM 31 11 0 27 Jun 2024
Conformal Prediction for Natural Language Processing: A Survey Margarida M. Campos António Farinhas Chrysoula Zerva Mário A. T. Figueiredo André F. T. Martins AI4CE 38 13 0 03 May 2024
Risks from Language Models for Automated Mental Healthcare: Ethics and Structure for Implementation D. Grabb Max Lamparth N. Vasan 30 14 0 02 Apr 2024
Position: Insights from Survey Methodology can Improve Training Data Stephanie Eckman Barbara Plank Frauke Kreuter SyDa 23 3 0 02 Mar 2024
DANSK and DaCy 2.6.0: Domain Generalization of Danish Named Entity Recognition K. Enevoldsen Fredrik Jørgensen Morten H Baglini 21 0 0 28 Feb 2024
Value Preferences Estimation and Disambiguation in Hybrid Participatory Systems Enrico Liscio Luciano Cavalcante Siebert Catholijn M. Jonker P. Murukannaiah 24 4 0 26 Feb 2024
Automatic Scoring of Cognition Drawings: Assessing the Quality of Machine-Based Scores Against a Gold Standard Arne Bethmann Marina Aoki Charlotte Hunsicker Claudia Weileder 11 0 0 28 Dec 2023
Quantifying Divergence for Human-AI Collaboration and Cognitive Trust Muge Kural Ali Gebesçe T. Chubakov Gözde Gül Sahin FedML 13 0 0 14 Dec 2023
Interpretation modeling: Social grounding of sentences by reasoning over their implicit moral judgments Liesbeth Allein Maria Mihaela Trucscva Marie-Francine Moens 10 1 0 27 Nov 2023
Human-in-the-loop: Towards Label Embeddings for Measuring Classification Difficulty Katharina Hechinger Christoph Koller Xiao Xiang Zhu Goran Kauermann UQCV 17 0 0 15 Nov 2023
PopBERT. Detecting populism and its host ideologies in the German Bundestag Lukas Erhard Sara Hanke Uwe Remer A. Falenska R. Heiberger 18 1 0 22 Sep 2023
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties Taylor Sorensen Liwei Jiang Jena D. Hwang Sydney Levine Valentina Pyatkin ... Kavel Rao Chandra Bhagavatula Maarten Sap J. Tasioulas Yejin Choi SLR 11 49 0 02 Sep 2023
How To Overcome Confirmation Bias in Semi-Supervised Image Classification By Active Learning Sandra Gilhuber Rasmus Hvingelby Mang Ling Ada Fok Thomas Seidl 14 1 0 16 Aug 2023
Large Language Models and Knowledge Graphs: Opportunities and Challenges Jeff Z. Pan Simon Razniewski Jan-Christoph Kalo Sneha Singhania Jiaoyan Chen ... Gerard de Melo A. Bonifati Edlira Vakaj M. Dragoni D. Graux KELM 28 71 0 11 Aug 2023
Collective Human Opinions in Semantic Textual Similarity Yuxia Wang Shimin Tao Ning Xie Hao-Yu Yang Timothy Baldwin Karin Verspoor 16 4 0 08 Aug 2023
Uncertainty in Natural Language Generation: From Theory to Applications Joris Baan Nico Daheim Evgenia Ilia Dennis Ulmer Haau-Sing Li Raquel Fernández Barbara Plank Rico Sennrich Chrysoula Zerva Wilker Aziz UQLM 12 39 0 28 Jul 2023
Evaluating AI systems under uncertain ground truth: a case study in dermatology David Stutz A. Cemgil Abhijit Guha Roy Tatiana Matejovicova Melih Barsbey ... Yossi Matias Pushmeet Kohli Yun-hui Liu Arnaud Doucet Alan Karthikesalingam 23 4 0 05 Jul 2023
The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics Matthias Orlikowski Paul Röttger Philipp Cimiano Italy 12 26 0 20 Jun 2023
No Strong Feelings One Way or Another: Re-operationalizing Neutrality in Natural Language Inference Animesh Nighojkar Antonio Laverghetta John Licato 18 4 0 16 Jun 2023
Conflicts, Villains, Resolutions: Towards models of Narrative Media Framing Lea Frermann Jiatong Li Shima Khanehzar Gosia Mikołajczak 14 12 0 03 Jun 2023
NLPositionality: Characterizing Design Biases of Datasets and Models Sebastin Santy Jenny T Liang Ronan Le Bras Katharina Reinecke Maarten Sap 25 75 0 02 Jun 2023
Being Right for Whose Right Reasons? Terne Sasha Thorn Jakobsen Laura Cabello Anders Søgaard 13 10 0 01 Jun 2023
ActiveAED: A Human in the Loop Improves Annotation Error Detection Leon Weber Barbara Plank 22 10 0 31 May 2023
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration Hwaran Lee Seokhee Hong Joonsuk Park Takyoung Kim M. Cha ... Eun-Ju Lee Yong Lim Alice H. Oh San-hee Park Jung-Woo Ha 20 15 0 28 May 2023
Using Natural Language Explanations to Rescale Human Judgments Manya Wadhwa Jifan Chen Junyi Jessy Li Greg Durrett 23 8 0 24 May 2023
Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models Natalie Shapira Mosh Levy S. Alavi Xuhui Zhou Yejin Choi Yoav Goldberg Maarten Sap Vered Shwartz LLMAG ELM 13 113 0 24 May 2023
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models Ashutosh Baheti Ximing Lu Faeze Brahman Ronan Le Bras Maarten Sap Mark O. Riedl 23 9 0 24 May 2023
You Are What You Annotate: Towards Better Models through Annotator Representations Naihao Deng Xinliang Frederick Zhang Siyang Liu Winston Wu Lu Wang Rada Mihalcea 19 20 0 24 May 2023
Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment Sky CH-Wang Arkadiy Saakyan Aochong Li Zhou Yu Smaranda Muresan 20 16 0 23 May 2023
EASE: An Easily-Customized Annotation System Powered by Efficiency Enhancement Mechanisms Naihao Deng Yikai Liu Mingye Chen Winston Wu Siyang Liu Yulong Chen Yue Zhang Rada Mihalcea 19 0 0 23 May 2023
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance Arjun Subramonian Xingdi Yuan Hal Daumé Su Lin Blodgett 19 17 0 15 May 2023
What's the Meaning of Superhuman Performance in Today's NLU? Simone Tedeschi Johan Bos T. Declerck Jan Hajic Daniel Hershcovich ... Simon Krek Steven Schockaert Rico Sennrich Ekaterina Shutova Roberto Navigli ELM LM&MA VLM ReLM LRM 18 26 0 15 May 2023
Toxicity Inspector: A Framework to Evaluate Ground Truth in Toxicity Detection Through Feedback Huriyyah Althunayan Rahaf Bahlas Manar Alharbi Lena Alsuwailem Abeer Aldayel Rehab Alahmadi 11 0 0 11 May 2023
iLab at SemEval-2023 Task 11 Le-Wi-Di: Modelling Disagreement or Modelling Perspectives? Nikolas Vitsakis Amit Parekh Tanvi Dinkar Gavin Abercrombie Ioannis Konstas Verena Rieser 37 10 0 10 May 2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation Patrick Fernandes Aman Madaan Emmy Liu António Farinhas Pedro Henrique Martins ... José G. C. de Souza Shuyan Zhou Tongshuang Wu Graham Neubig André F. T. Martins ALM 113 56 0 01 May 2023
SemEval-2023 Task 11: Learning With Disagreements (LeWiDi) Elisa Leonardelli Alexandra Uma Gavin Abercrombie Dina Almanea Valerio Basile Tommaso Fornaciari Barbara Plank Verena Rieser Massimo Poesio 37 54 0 28 Apr 2023
We're Afraid Language Models Aren't Modeling Ambiguity Alisa Liu Zhaofeng Wu Julian Michael Alane Suhr Peter West Alexander Koller Swabha Swayamdipta Noah A. Smith Yejin Choi 63 87 0 27 Apr 2023
Understanding and Predicting Human Label Variation in Natural Language Inference through Explanation Nan-Jiang Jiang Chenhao Tan M. Marneffe 17 2 0 24 Apr 2023