ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.02570
  4. Cited By
The 'Problem' of Human Label Variation: On Ground Truth in Data,
  Modeling and Evaluation

The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

4 November 2022
Barbara Plank
ArXivPDFHTML

Papers citing "The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation"

50 / 59 papers shown
Title
MetaHarm: Harmful YouTube Video Dataset Annotated by Domain Experts, GPT-4-Turbo, and Crowdworkers
MetaHarm: Harmful YouTube Video Dataset Annotated by Domain Experts, GPT-4-Turbo, and Crowdworkers
Wonjeong Jo
Magdalena Wojcieszak
19
0
0
22 Apr 2025
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels
Luke M. Guerdan
Solon Barocas
Kenneth Holstein
Hanna M. Wallach
Zhiwei Steven Wu
Alexandra Chouldechova
ALM
ELM
120
0
0
13 Mar 2025
Embracing Diversity: A Multi-Perspective Approach with Soft Labels
Benedetta Muscato
Praveen Bushipaka
Gizem Gezici
Lucia Passaro
F. Giannotti
Tommaso Cucinotta
34
0
0
01 Mar 2025
AI Alignment at Your Discretion
AI Alignment at Your Discretion
Maarten Buyl
Hadi Khalaf
C. M. Verdun
Lucas Monteiro Paes
Caio Vieira Machado
Flavio du Pin Calmon
33
0
0
10 Feb 2025
Exploring the Influence of Label Aggregation on Minority Voices:
  Implications for Dataset Bias and Model Training
Exploring the Influence of Label Aggregation on Minority Voices: Implications for Dataset Bias and Model Training
Mugdha Pandya
Nafise Sadat Moosavi
Diana Maynard
67
0
0
05 Dec 2024
Towards Fair Pay and Equal Work: Imposing View Time Limits in Crowdsourced Image Classification
Gordon Lim
Stefan Larson
Yu Huang
Kevin Leach
73
0
0
29 Nov 2024
Conformalized Credal Regions for Classification with Ambiguous Ground Truth
Conformalized Credal Regions for Classification with Ambiguous Ground Truth
Michele Caprio
David Stutz
Shuo Li
Arnaud Doucet
UQCV
57
4
0
07 Nov 2024
Harmful YouTube Video Detection: A Taxonomy of Online Harm and MLLMs as
  Alternative Annotators
Harmful YouTube Video Detection: A Taxonomy of Online Harm and MLLMs as Alternative Annotators
Claire Wonjeong Jo
Miki Wesołowska
Magdalena Wojcieszak
18
4
0
06 Nov 2024
Reducing annotator bias by belief elicitation
Reducing annotator bias by belief elicitation
Terne Sasha Thorn Jakobsen
Andreas Bjerre-Nielsen
Robert Böhm
34
0
0
21 Oct 2024
Label Convergence: Defining an Upper Performance Bound in Object Recognition through Contradictory Annotations
Label Convergence: Defining an Upper Performance Bound in Object Recognition through Contradictory Annotations
David Tschirschwitz
Volker Rodehorst
16
1
0
14 Sep 2024
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in
  Subjective Tasks?
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?
Urja Khurana
Eric T. Nalisnick
Antske Fokkens
Swabha Swayamdipta
29
3
0
26 Aug 2024
Accelerating Domain-Aware Electron Microscopy Analysis Using Deep
  Learning Models with Synthetic Data and Image-Wide Confidence Scoring
Accelerating Domain-Aware Electron Microscopy Analysis Using Deep Learning Models with Synthetic Data and Image-Wide Confidence Scoring
Matthew J. Lynch
Ryan Jacobs
Gabriella Bruno
Priyam V. Patki
Dane Morgan
Kevin G. Field
21
0
0
02 Aug 2024
Fundamental Problems With Model Editing: How Should Rational Belief
  Revision Work in LLMs?
Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?
Peter Hase
Thomas Hofweber
Xiang Zhou
Elias Stengel-Eskin
Mohit Bansal
KELM
LRM
31
11
0
27 Jun 2024
Conformal Prediction for Natural Language Processing: A Survey
Conformal Prediction for Natural Language Processing: A Survey
Margarida M. Campos
António Farinhas
Chrysoula Zerva
Mário A. T. Figueiredo
André F. T. Martins
AI4CE
38
13
0
03 May 2024
Risks from Language Models for Automated Mental Healthcare: Ethics and
  Structure for Implementation
Risks from Language Models for Automated Mental Healthcare: Ethics and Structure for Implementation
D. Grabb
Max Lamparth
N. Vasan
30
14
0
02 Apr 2024
Position: Insights from Survey Methodology can Improve Training Data
Position: Insights from Survey Methodology can Improve Training Data
Stephanie Eckman
Barbara Plank
Frauke Kreuter
SyDa
23
3
0
02 Mar 2024
DANSK and DaCy 2.6.0: Domain Generalization of Danish Named Entity
  Recognition
DANSK and DaCy 2.6.0: Domain Generalization of Danish Named Entity Recognition
K. Enevoldsen
Fredrik Jørgensen
Morten H Baglini
21
0
0
28 Feb 2024
Value Preferences Estimation and Disambiguation in Hybrid Participatory Systems
Value Preferences Estimation and Disambiguation in Hybrid Participatory Systems
Enrico Liscio
Luciano Cavalcante Siebert
Catholijn M. Jonker
P. Murukannaiah
24
4
0
26 Feb 2024
Automatic Scoring of Cognition Drawings: Assessing the Quality of
  Machine-Based Scores Against a Gold Standard
Automatic Scoring of Cognition Drawings: Assessing the Quality of Machine-Based Scores Against a Gold Standard
Arne Bethmann
Marina Aoki
Charlotte Hunsicker
Claudia Weileder
11
0
0
28 Dec 2023
Quantifying Divergence for Human-AI Collaboration and Cognitive Trust
Quantifying Divergence for Human-AI Collaboration and Cognitive Trust
Muge Kural
Ali Gebesçe
T. Chubakov
Gözde Gül Sahin
FedML
13
0
0
14 Dec 2023
Interpretation modeling: Social grounding of sentences by reasoning over
  their implicit moral judgments
Interpretation modeling: Social grounding of sentences by reasoning over their implicit moral judgments
Liesbeth Allein
Maria Mihaela Trucscva
Marie-Francine Moens
10
1
0
27 Nov 2023
Human-in-the-loop: Towards Label Embeddings for Measuring Classification
  Difficulty
Human-in-the-loop: Towards Label Embeddings for Measuring Classification Difficulty
Katharina Hechinger
Christoph Koller
Xiao Xiang Zhu
Goran Kauermann
UQCV
17
0
0
15 Nov 2023
PopBERT. Detecting populism and its host ideologies in the German
  Bundestag
PopBERT. Detecting populism and its host ideologies in the German Bundestag
Lukas Erhard
Sara Hanke
Uwe Remer
A. Falenska
R. Heiberger
18
1
0
22 Sep 2023
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights,
  and Duties
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Taylor Sorensen
Liwei Jiang
Jena D. Hwang
Sydney Levine
Valentina Pyatkin
...
Kavel Rao
Chandra Bhagavatula
Maarten Sap
J. Tasioulas
Yejin Choi
SLR
11
49
0
02 Sep 2023
How To Overcome Confirmation Bias in Semi-Supervised Image
  Classification By Active Learning
How To Overcome Confirmation Bias in Semi-Supervised Image Classification By Active Learning
Sandra Gilhuber
Rasmus Hvingelby
Mang Ling Ada Fok
Thomas Seidl
14
1
0
16 Aug 2023
Large Language Models and Knowledge Graphs: Opportunities and Challenges
Large Language Models and Knowledge Graphs: Opportunities and Challenges
Jeff Z. Pan
Simon Razniewski
Jan-Christoph Kalo
Sneha Singhania
Jiaoyan Chen
...
Gerard de Melo
A. Bonifati
Edlira Vakaj
M. Dragoni
D. Graux
KELM
28
71
0
11 Aug 2023
Collective Human Opinions in Semantic Textual Similarity
Collective Human Opinions in Semantic Textual Similarity
Yuxia Wang
Shimin Tao
Ning Xie
Hao-Yu Yang
Timothy Baldwin
Karin Verspoor
16
4
0
08 Aug 2023
Uncertainty in Natural Language Generation: From Theory to Applications
Uncertainty in Natural Language Generation: From Theory to Applications
Joris Baan
Nico Daheim
Evgenia Ilia
Dennis Ulmer
Haau-Sing Li
Raquel Fernández
Barbara Plank
Rico Sennrich
Chrysoula Zerva
Wilker Aziz
UQLM
12
39
0
28 Jul 2023
Evaluating AI systems under uncertain ground truth: a case study in dermatology
Evaluating AI systems under uncertain ground truth: a case study in dermatology
David Stutz
A. Cemgil
Abhijit Guha Roy
Tatiana Matejovicova
Melih Barsbey
...
Yossi Matias
Pushmeet Kohli
Yun-hui Liu
Arnaud Doucet
Alan Karthikesalingam
23
4
0
05 Jul 2023
The Ecological Fallacy in Annotation: Modelling Human Label Variation
  goes beyond Sociodemographics
The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics
Matthias Orlikowski
Paul Röttger
Philipp Cimiano
Italy
12
26
0
20 Jun 2023
No Strong Feelings One Way or Another: Re-operationalizing Neutrality in
  Natural Language Inference
No Strong Feelings One Way or Another: Re-operationalizing Neutrality in Natural Language Inference
Animesh Nighojkar
Antonio Laverghetta
John Licato
18
4
0
16 Jun 2023
Conflicts, Villains, Resolutions: Towards models of Narrative Media
  Framing
Conflicts, Villains, Resolutions: Towards models of Narrative Media Framing
Lea Frermann
Jiatong Li
Shima Khanehzar
Gosia Mikołajczak
14
12
0
03 Jun 2023
NLPositionality: Characterizing Design Biases of Datasets and Models
NLPositionality: Characterizing Design Biases of Datasets and Models
Sebastin Santy
Jenny T Liang
Ronan Le Bras
Katharina Reinecke
Maarten Sap
25
75
0
02 Jun 2023
Being Right for Whose Right Reasons?
Being Right for Whose Right Reasons?
Terne Sasha Thorn Jakobsen
Laura Cabello
Anders Søgaard
13
10
0
01 Jun 2023
ActiveAED: A Human in the Loop Improves Annotation Error Detection
ActiveAED: A Human in the Loop Improves Annotation Error Detection
Leon Weber
Barbara Plank
22
10
0
31 May 2023
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable
  Responses Created Through Human-Machine Collaboration
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
M. Cha
...
Eun-Ju Lee
Yong Lim
Alice H. Oh
San-hee Park
Jung-Woo Ha
20
15
0
28 May 2023
Using Natural Language Explanations to Rescale Human Judgments
Using Natural Language Explanations to Rescale Human Judgments
Manya Wadhwa
Jifan Chen
Junyi Jessy Li
Greg Durrett
23
8
0
24 May 2023
Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in
  Large Language Models
Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models
Natalie Shapira
Mosh Levy
S. Alavi
Xuhui Zhou
Yejin Choi
Yoav Goldberg
Maarten Sap
Vered Shwartz
LLMAG
ELM
13
113
0
24 May 2023
Leftover Lunch: Advantage-based Offline Reinforcement Learning for
  Language Models
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
Ashutosh Baheti
Ximing Lu
Faeze Brahman
Ronan Le Bras
Maarten Sap
Mark O. Riedl
23
9
0
24 May 2023
You Are What You Annotate: Towards Better Models through Annotator
  Representations
You Are What You Annotate: Towards Better Models through Annotator Representations
Naihao Deng
Xinliang Frederick Zhang
Siyang Liu
Winston Wu
Lu Wang
Rada Mihalcea
19
20
0
24 May 2023
Sociocultural Norm Similarities and Differences via Situational
  Alignment and Explainable Textual Entailment
Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment
Sky CH-Wang
Arkadiy Saakyan
Aochong Li
Zhou Yu
Smaranda Muresan
20
16
0
23 May 2023
EASE: An Easily-Customized Annotation System Powered by Efficiency
  Enhancement Mechanisms
EASE: An Easily-Customized Annotation System Powered by Efficiency Enhancement Mechanisms
Naihao Deng
Yikai Liu
Mingye Chen
Winston Wu
Siyang Liu
Yulong Chen
Yue Zhang
Rada Mihalcea
19
0
0
23 May 2023
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and
  Measurements of Performance
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance
Arjun Subramonian
Xingdi Yuan
Hal Daumé
Su Lin Blodgett
19
17
0
15 May 2023
What's the Meaning of Superhuman Performance in Today's NLU?
What's the Meaning of Superhuman Performance in Today's NLU?
Simone Tedeschi
Johan Bos
T. Declerck
Jan Hajic
Daniel Hershcovich
...
Simon Krek
Steven Schockaert
Rico Sennrich
Ekaterina Shutova
Roberto Navigli
ELM
LM&MA
VLM
ReLM
LRM
18
26
0
15 May 2023
Toxicity Inspector: A Framework to Evaluate Ground Truth in Toxicity
  Detection Through Feedback
Toxicity Inspector: A Framework to Evaluate Ground Truth in Toxicity Detection Through Feedback
Huriyyah Althunayan
Rahaf Bahlas
Manar Alharbi
Lena Alsuwailem
Abeer Aldayel
Rehab Alahmadi
11
0
0
11 May 2023
iLab at SemEval-2023 Task 11 Le-Wi-Di: Modelling Disagreement or
  Modelling Perspectives?
iLab at SemEval-2023 Task 11 Le-Wi-Di: Modelling Disagreement or Modelling Perspectives?
Nikolas Vitsakis
Amit Parekh
Tanvi Dinkar
Gavin Abercrombie
Ioannis Konstas
Verena Rieser
37
10
0
10 May 2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
  Language Generation
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes
Aman Madaan
Emmy Liu
António Farinhas
Pedro Henrique Martins
...
José G. C. de Souza
Shuyan Zhou
Tongshuang Wu
Graham Neubig
André F. T. Martins
ALM
113
56
0
01 May 2023
SemEval-2023 Task 11: Learning With Disagreements (LeWiDi)
SemEval-2023 Task 11: Learning With Disagreements (LeWiDi)
Elisa Leonardelli
Alexandra Uma
Gavin Abercrombie
Dina Almanea
Valerio Basile
Tommaso Fornaciari
Barbara Plank
Verena Rieser
Massimo Poesio
37
54
0
28 Apr 2023
We're Afraid Language Models Aren't Modeling Ambiguity
We're Afraid Language Models Aren't Modeling Ambiguity
Alisa Liu
Zhaofeng Wu
Julian Michael
Alane Suhr
Peter West
Alexander Koller
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
63
87
0
27 Apr 2023
Understanding and Predicting Human Label Variation in Natural Language
  Inference through Explanation
Understanding and Predicting Human Label Variation in Natural Language Inference through Explanation
Nan-Jiang Jiang
Chenhao Tan
M. Marneffe
17
2
0
24 Apr 2023
12
Next