ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.07898
  4. Cited By
Are We Modeling the Task or the Annotator? An Investigation of Annotator
  Bias in Natural Language Understanding Datasets
v1v2 (latest)

Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
21 August 2019
Mor Geva
Yoav Goldberg
Jonathan Berant
ArXiv (abs)PDFHTML

Papers citing "Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets"

50 / 213 papers shown
Bias in, Bias out: Annotation Bias in Multilingual Large Language Models
Bias in, Bias out: Annotation Bias in Multilingual Large Language Models
Xia Cui
Ziyi Huang
Naeemeh Adel
100
0
0
18 Nov 2025
FedBook: A Unified Federated Graph Foundation Codebook with Intra-domain and Inter-domain Knowledge Modeling
FedBook: A Unified Federated Graph Foundation Codebook with Intra-domain and Inter-domain Knowledge Modeling
Zhengyu Wu
Yinlin Zhu
Xunkai Li
Ziang Qiu
Rong-Hua Li
Guoren Wang
Chenghu Zhou
FedML
184
1
0
09 Oct 2025
Are You Sure You're Positive? Consolidating Chain-of-Thought Agents with Uncertainty Quantification for Aspect-Category Sentiment Analysis
Are You Sure You're Positive? Consolidating Chain-of-Thought Agents with Uncertainty Quantification for Aspect-Category Sentiment Analysis
Filippos Ventirozos
Peter Appleby
Matthew Shardlow
LLMAG
96
1
0
24 Aug 2025
Exploring Explanations Improves the Robustness of In-Context Learning
Exploring Explanations Improves the Robustness of In-Context LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ukyo Honda
Tatsushi Oka
LRM
295
0
0
03 Jun 2025
Recover Experimental Data with Selection Bias using Counterfactual Logic
Recover Experimental Data with Selection Bias using Counterfactual Logic
Jingyang He
Shuai Wang
Ang Li
CML
198
0
0
31 May 2025
Social Bias in Popular Question-Answering Benchmarks
Social Bias in Popular Question-Answering Benchmarks
Angelie Kraft
Judith Simon
Sonja Schimmler
495
4
0
21 May 2025
Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models
Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning ModelsProceedings of the ACM on Human-Computer Interaction (PACMHCI), 2025
Lynnette Ng
Kokil Jaidka
Kaiyuan Tay
Hansin Ahuja
Niyati Chhaya
388
2
0
26 Mar 2025
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram DescriptionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Wan Ju Kang
Eunki Kim
Na Min An
Sangryul Kim
Haemin Choi
Ki Hoon Kwak
Hyunjung Shim
382
3
0
17 Mar 2025
CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering
CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering
Zongxi Li
Jian Wang
Haoran Xie
S. J. Qin
485
6
0
03 Feb 2025
From Jack of All Trades to Master of One: Specializing LLM-based
  Autoraters to a Test Set
From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
M. Finkelstein
Dan Deutsch
Parker Riley
Juraj Juraska
Geza Kovacs
Markus Freitag
359
2
0
23 Nov 2024
Mitigating Biases to Embrace Diversity: A Comprehensive Annotation
  Benchmark for Toxic Language
Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language
Xinmeng Hou
273
1
0
17 Oct 2024
LLM-Human Pipeline for Cultural Context Grounding of Conversations
LLM-Human Pipeline for Cultural Context Grounding of Conversations
Rajkumar Pujari
Dan Goldwasser
314
2
0
17 Oct 2024
Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets
Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and TargetsInternational Conference on Web and Social Media (ICWSM), 2024
Tommaso Giorgi
Lorenzo Cima
T. Fagni
Marco Avvenuti
S. Cresci
646
37
0
10 Oct 2024
Rater Cohesion and Quality from a Vicarious Perspective
Rater Cohesion and Quality from a Vicarious PerspectiveConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Deepak Pandita
Tharindu Cyril Weerasooriya
Sujan Dutta
Sarah K. K. Luger
Tharindu Ranasinghe
Ashiqur R. KhudaBukhsh
Marcos Zampieri
Christopher M. Homan
242
5
0
15 Aug 2024
On Tables with Numbers, with Numbers
On Tables with Numbers, with Numbers
Konstantinos Kogkalidis
S. Chatzikyriakidis
LMTD
476
3
0
12 Aug 2024
PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization
PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization
Christopher Clarke
Yuzhao Heng
Lingjia Tang
Jason Mars
246
10
0
25 Jul 2024
Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to
  Address Shortcut Shifts in Natural Language Understanding
Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding
Ukyo Honda
Tatsushi Oka
Peinan Zhang
Masato Mita
292
1
0
17 Jun 2024
They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate
  Associative Bias
They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias
Salma Abdel Magid
Jui-Hsien Wang
Kushal Kafle
Hanspeter Pfister
316
2
0
17 Jun 2024
Are We Done with MMLU?
Are We Done with MMLU?
Aryo Pradipta Gema
Joshua Ong Jun Leang
Giwon Hong
Alessio Devoto
Alberto Carlo Maria Mancino
...
R. McHardy
Joshua Harris
Jean Kaddour
Emile van Krieken
Pasquale Minervini
ELM
490
119
0
06 Jun 2024
Perception of Knowledge Boundary for Large Language Models through
  Semi-open-ended Question Answering
Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question AnsweringNeural Information Processing Systems (NeurIPS), 2024
Zhihua Wen
Zhiliang Tian
Z. Jian
Zhen Huang
Pei Ke
Yifu Gao
Shiyu Huang
Dongsheng Li
318
27
0
23 May 2024
The Perspectivist Paradigm Shift: Assumptions and Challenges of
  Capturing Human Labels
The Perspectivist Paradigm Shift: Assumptions and Challenges of Capturing Human LabelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Eve Fleisig
Su Lin Blodgett
Dan Klein
Zeerak Talat
312
39
0
09 May 2024
G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous
  Knowledge for Commonsense Reasoning
G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense ReasoningInternational Conference on Multimedia Retrieval (ICMR), 2024
Ruiting Dai
Yuqiao Tan
Lisi Mo
Shuang Liang
Guohao Huo
Jiayi Luo
Yao Cheng
ReLMRALMLRM
187
2
0
09 May 2024
From Form(s) to Meaning: Probing the Semantic Depths of Language Models
  Using Multisense Consistency
From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
Xenia Ohmer
Elia Bruni
Dieuwke Hupkes
AI4CE
331
11
0
18 Apr 2024
D3CODE: Disentangling Disagreements in Data across Cultures on
  Offensiveness Detection and Evaluation
D3CODE: Disentangling Disagreements in Data across Cultures on Offensiveness Detection and Evaluation
Aida Mostafazadeh Davani
Mark Díaz
Dylan K. Baker
Vinodkumar Prabhakaran
261
10
0
16 Apr 2024
Reducing Large Language Model Bias with Emphasis on 'Restricted
  Industries': Automated Dataset Augmentation and Prejudice Quantification
Reducing Large Language Model Bias with Emphasis on 'Restricted Industries': Automated Dataset Augmentation and Prejudice Quantification
Devam Mondal
Carlo Lipizzi
231
0
0
20 Mar 2024
Quiet-STaR: Language Models Can Teach Themselves to Think Before
  Speaking
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
E. Zelikman
Georges Harik
Yijia Shao
Varuna Jayasiri
Nick Haber
Noah D. Goodman
LLMAGReLMLRM
756
241
0
14 Mar 2024
Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious
  Challenges in Multimodal Reasoning
Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning
Deepanway Ghosal
Vernon Toh Yan Han
Chia Yew Ken
Soujanya Poria
ReLMLRM
364
23
0
06 Mar 2024
TRUCE: Private Benchmarking to Prevent Contamination and Improve
  Comparative Evaluation of LLMs
TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs
Tanmay Rajore
Nishanth Chandran
Sunayana Sitaram
Divya Gupta
Rahul Sharma
Kashish Mittal
Manohar Swaminathan
361
27
0
01 Mar 2024
Interpreting Predictive Probabilities: Model Confidence or Human Label
  Variation?
Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?
Joris Baan
Raquel Fernández
Barbara Plank
Wilker Aziz
390
17
0
25 Feb 2024
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions
  Without the Question?
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
Nishant Balepur
Abhilasha Ravichander
Rachel Rudinger
ELM
363
65
0
19 Feb 2024
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
Jiaheng Wei
Yuanshun Yao
Jean-François Ton
Hongyi Guo
Andrew Estornell
Yang Liu
HILM
379
41
0
16 Feb 2024
Discipline and Label: A WEIRD Genealogy and Social Theory of Data
  Annotation
Discipline and Label: A WEIRD Genealogy and Social Theory of Data Annotation
Andrew Smart
Ding Wang
Ellis Monk
Mark Díaz
Atoosa Kasirzadeh
Erin van Liemt
Sonja Schmer-Galunder
214
15
0
09 Feb 2024
Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data
Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data
Leonardo Castro-Gonzalez
Yi-Ling Chung
Hannak Rose Kirk
John Francis
Angus R. Williams
Pica Johansson
Jonathan Bright
332
2
0
22 Jan 2024
Disentangling Perceptions of Offensiveness: Cultural and Moral
  Correlates
Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates
Aida Mostafazadeh Davani
Mark Díaz
Dylan K. Baker
Vinodkumar Prabhakaran
AAML
225
34
0
11 Dec 2023
Annotation Sensitivity: Training Data Collection Methods Affect Model
  Performance
Annotation Sensitivity: Training Data Collection Methods Affect Model PerformanceConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Christoph Kern
Stephanie Eckman
Jacob Beck
Rob Chew
Bolei Ma
Frauke Kreuter
413
17
0
23 Nov 2023
Unmasking and Improving Data Credibility: A Study with Datasets for
  Training Harmless Language Models
Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models
Zhaowei Zhu
Jialu Wang
Hao Cheng
Yang Liu
321
29
0
19 Nov 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in
  Perspectives
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran
Christopher Homan
Lora Aroyo
Aida Mostafazadeh Davani
Alicia Parrish
Alex S. Taylor
Mark Díaz
Ding Wang
Greg Serapio-García
305
18
0
09 Nov 2023
Measuring Adversarial Datasets
Measuring Adversarial Datasets
Yuanchen Bai
Raoyi Huang
Vijay Viswanathan
Tzu-Sheng Kuo
Tongshuang Wu
265
1
0
06 Nov 2023
Defining a New NLP Playground
Defining a New NLP PlaygroundConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Sha Li
Chi Han
Pengfei Yu
Carl Edwards
Pengfei Yu
...
Yi R. Fung
Charles Yu
Joel R. Tetreault
Eduard H. Hovy
Heng Ji
415
7
0
31 Oct 2023
CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks
CRoW: Benchmarking Commonsense Reasoning in Real-World TasksConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Mete Ismayilzada
Debjit Paul
Syrielle Montariol
Mor Geva
Antoine Bosselut
LRM
294
7
0
23 Oct 2023
Ecologically Valid Explanations for Label Variation in NLI
Ecologically Valid Explanations for Label Variation in NLI
Nan-Jiang Jiang
Chenhao Tan
M. Marneffe
FAtt
280
14
0
20 Oct 2023
Mind the instructions: a holistic evaluation of consistency and
  interactions in prompt-based learning
Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning
Lucas Weber
Elia Bruni
Dieuwke Hupkes
302
37
0
20 Oct 2023
The Past, Present and Better Future of Feedback Learning in Large
  Language Models for Subjective Human Preferences and Values
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and ValuesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hannah Rose Kirk
Andrew M. Bean
Bertie Vidgen
Paul Röttger
Scott A. Hale
ALM
425
66
0
11 Oct 2023
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language
  Models
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
LRM
374
49
0
08 Sep 2023
Teaching Smaller Language Models To Generalise To Unseen Compositional
  Questions
Teaching Smaller Language Models To Generalise To Unseen Compositional Questions
Tim Hartill
N. Tan
Michael Witbrock
Patricia J. Riddle
ReLMKELMLRM
287
4
0
02 Aug 2023
Uncertainty in Natural Language Generation: From Theory to Applications
Uncertainty in Natural Language Generation: From Theory to Applications
Joris Baan
Nico Daheim
Evgenia Ilia
Dennis Ulmer
Haau-Sing Li
Raquel Fernández
Barbara Plank
Rico Sennrich
Chrysoula Zerva
Wilker Aziz
UQLM
569
69
0
28 Jul 2023
Analyzing Dataset Annotation Quality Management in the Wild
Analyzing Dataset Annotation Quality Management in the WildComputational Linguistics (CL), 2023
Jan-Christoph Klie
Richard Eckart de Castilho
Iryna Gurevych
482
61
0
16 Jul 2023
Does Collaborative Human-LM Dialogue Generation Help Information
  Extraction from Human Dialogues?
Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues?
Bo-Ru Lu
Nikita Haduong
Chia-Hsuan Lee
Zeqiu Wu
Hao Cheng
Paul Koester
J. Utke
Tao Yu
Noah A. Smith
Mari Ostendorf
SyDa
257
3
0
13 Jul 2023
A Survey on Out-of-Distribution Evaluation of Neural NLP Models
A Survey on Out-of-Distribution Evaluation of Neural NLP ModelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Xinzhe Li
Ming Liu
Shang Gao
Wray Buntine
284
24
0
27 Jun 2023
Closing the Loop: Testing ChatGPT to Generate Model Explanations to
  Improve Human Labelling of Sponsored Content on Social Media
Closing the Loop: Testing ChatGPT to Generate Model Explanations to Improve Human Labelling of Sponsored Content on Social Media
Thales Bertaglia
Stefan Huber
Catalina Goanta
Gerasimos Spanakis
Adriana Iamnitchi
249
15
0
08 Jun 2023
12345
Next
Page 1 of 5