v1v2 (latest)

Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

21 August 2019

Papers citing "Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets"

50 / 213 papers shown

Bias in, Bias out: Annotation Bias in Multilingual Large Language Models

Xia Cui

Ziyi Huang

Naeemeh Adel

100

18 Nov 2025

FedBook: A Unified Federated Graph Foundation Codebook with Intra-domain and Inter-domain Knowledge Modeling

184

09 Oct 2025

Are You Sure You're Positive? Consolidating Chain-of-Thought Agents with Uncertainty Quantification for Aspect-Category Sentiment Analysis

24 Aug 2025

Exploring Explanations Improves the Robustness of In-Context LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Ukyo Honda

Tatsushi Oka

LRM

295

03 Jun 2025

Recover Experimental Data with Selection Bias using Counterfactual Logic

198

31 May 2025

Social Bias in Popular Question-Answering Benchmarks

Angelie Kraft

Judith Simon

Sonja Schimmler

495

21 May 2025

Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning ModelsProceedings of the ACM on Human-Computer Interaction (PACMHCI), 2025

388

26 Mar 2025

Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram DescriptionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

382

17 Mar 2025

CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering

485

03 Feb 2025

From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set

359

23 Nov 2024

Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language

Xinmeng Hou

273

17 Oct 2024

LLM-Human Pipeline for Cultural Context Grounding of Conversations

Rajkumar Pujari

Dan Goldwasser

314

17 Oct 2024

Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and TargetsInternational Conference on Web and Social Media (ICWSM), 2024

646

10 Oct 2024

Rater Cohesion and Quality from a Vicarious PerspectiveConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Deepak Pandita

Tharindu Cyril Weerasooriya

Sujan Dutta

Sarah K. K. Luger

Tharindu Ranasinghe

Ashiqur R. KhudaBukhsh

Marcos Zampieri

Christopher M. Homan

242

15 Aug 2024

On Tables with Numbers, with Numbers

Konstantinos Kogkalidis

S. Chatzikyriakidis

LMTD

476

12 Aug 2024

PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization

246

25 Jul 2024

Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding

292

17 Jun 2024

They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias

Salma Abdel Magid

Jui-Hsien Wang

Kushal Kafle

Hanspeter Pfister

316

17 Jun 2024

Are We Done with MMLU?

Alberto Carlo Maria Mancino

...

Joshua Harris

Pasquale Minervini

490

119

06 Jun 2024

Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question AnsweringNeural Information Processing Systems (NeurIPS), 2024

Dongsheng Li

318

23 May 2024

The Perspectivist Paradigm Shift: Assumptions and Challenges of Capturing Human LabelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

312

09 May 2024

G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense ReasoningInternational Conference on Multimedia Retrieval (ICMR), 2024

187

09 May 2024

From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency

331

18 Apr 2024

D3CODE: Disentangling Disagreements in Data across Cultures on Offensiveness Detection and Evaluation

Aida Mostafazadeh Davani

Mark Díaz

Dylan K. Baker

Vinodkumar Prabhakaran

261

16 Apr 2024

Reducing Large Language Model Bias with Emphasis on 'Restricted Industries': Automated Dataset Augmentation and Prejudice Quantification

Devam Mondal

Carlo Lipizzi

231

20 Mar 2024

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

756

241

14 Mar 2024

Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning

364

06 Mar 2024

TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs

361

01 Mar 2024

Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?

390

25 Feb 2024

Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?

Nishant Balepur

Abhilasha Ravichander

Rachel Rudinger

ELM

363

19 Feb 2024

Measuring and Reducing LLM Hallucination without Gold-Standard Answers

379

16 Feb 2024

Discipline and Label: A WEIRD Genealogy and Social Theory of Data Annotation

Ding Wang

Sonja Schmer-Galunder

214

09 Feb 2024

Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data

Leonardo Castro-Gonzalez

332

22 Jan 2024

Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates

Aida Mostafazadeh Davani

Mark Díaz

Dylan K. Baker

Vinodkumar Prabhakaran

AAML

225

11 Dec 2023

Annotation Sensitivity: Training Data Collection Methods Affect Model PerformanceConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

413

23 Nov 2023

Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models

Zhaowei Zhu

Jialu Wang

Hao Cheng

Yang Liu

321

19 Nov 2023

GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives

Vinodkumar Prabhakaran

Christopher Homan

Lora Aroyo

Aida Mostafazadeh Davani

305

09 Nov 2023

Measuring Adversarial Datasets

265

06 Nov 2023

Defining a New NLP PlaygroundConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

...

Heng Ji

415

31 Oct 2023

CRoW: Benchmarking Commonsense Reasoning in Real-World TasksConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

294

23 Oct 2023

Ecologically Valid Explanations for Label Variation in NLI

280

20 Oct 2023

Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning

Lucas Weber

Elia Bruni

Dieuwke Hupkes

302

20 Oct 2023

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and ValuesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Paul Röttger

425

11 Oct 2023

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Heng Ji

374

08 Sep 2023

Teaching Smaller Language Models To Generalise To Unseen Compositional Questions

287

02 Aug 2023

Uncertainty in Natural Language Generation: From Theory to Applications

Haau-Sing Li

569

28 Jul 2023

Analyzing Dataset Annotation Quality Management in the WildComputational Linguistics (CL), 2023

Jan-Christoph Klie

Richard Eckart de Castilho

Iryna Gurevych

482

16 Jul 2023

Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues?

Bo-Ru Lu

Tao Yu

Mari Ostendorf

257

13 Jul 2023

A Survey on Out-of-Distribution Evaluation of Neural NLP ModelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

284

27 Jun 2023

Closing the Loop: Testing ChatGPT to Generate Model Explanations to Improve Human Labelling of Sponsored Content on Social Media

Gerasimos Spanakis

249

08 Jun 2023