v1v2 (latest)

Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension

Transactions of the Association for Computational Linguistics (TACL), 2020

2 February 2020

Papers citing "Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension"

50 / 91 papers shown

Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering

Sai Shridhar Balamurali

Lu Cheng

165

10 Nov 2025

ARC-Encoder: learning compressed text representations for large language models

191

23 Oct 2025

GRADE: Generating multi-hop QA and fine-gRAined Difficulty matrix for RAG Evaluation

Jeongsoo Lee

Daeyong Kwon

Kyohoon Jin

134

23 Aug 2025

AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs

254

24 Jul 2025

Hatevolution: What Static Benchmarks Don't Tell UsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Chiara Di Bonaventura

Barbara McGillivray

Yulan He

Albert Meroño-Peñuela

239

13 Jun 2025

Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

267

01 Jun 2025

CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation

Elahe Khatibi

Ziyu Wang

Amir M. Rahmani

253

17 Apr 2025

Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension

470

23 Feb 2025

Enhancing Financial Fraud Detection with Human-in-the-Loop Feedback and Feedback PropagationInternational Conference on Machine Learning and Applications (ICMLA), 2024

Prashank Kadam

229

07 Nov 2024

Gamified crowd-sourcing of high-quality data for visual fine-tuning

342

05 Oct 2024

Data Contamination Report from the 2024 CONDA Shared Task

Iker García-Ferrero

...

Yu-Min Tseng

320

31 Jul 2024

A New Benchmark Dataset and Mixture-of-Experts Language Models for Adversarial Natural Language Inference in Vietnamese

Tin Van Huynh

Kiet Van Nguyen

Ngan Luu-Thuy Nguyen

384

25 Jun 2024

Generative AI for Synthetic Data Generation: Methods, Challenges and the Future

Xu Guo

Yiqiang Chen

SyDa

219

07 Mar 2024

Desiderata for the Context Use of Question Answering Systems

Sagi Shaier

Lawrence E Hunter

Katharina von der Wense

380

31 Jan 2024

How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation

Yoo Yeon Sung

Ishani Mondal

Jordan L. Boyd-Graber

283

20 Jan 2024

Elo Uncovered: Robustness and Best Practices in Language Model EvaluationIEEE Games Entertainment Media Conference (IEEE GEM), 2023

267

29 Nov 2023

Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks

292

25 Oct 2023

Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models

Ruida Wang

Wangchunshu Zhou

Mrinmaya Sachan

276

20 Oct 2023

Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning

Lucas Weber

Elia Bruni

Dieuwke Hupkes

302

20 Oct 2023

Pseudointelligence: A Unifying Framework for Language Model Evaluation

Shikhar Murty

Orr Paradise

Pratyusha Sharma

179

18 Oct 2023

No Offense Taken: Eliciting Offensiveness from Language Models

Anugya Srivastava

Rahul Ahuja

Rohith Mukku

276

02 Oct 2023

Teaching Smaller Language Models To Generalise To Unseen Compositional Questions

287

02 Aug 2023

Text Alignment Is An Efficient Unified Model for Massive NLP TasksNeural Information Processing Systems (NeurIPS), 2023

371

06 Jul 2023

Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs EvaluationsNeural Information Processing Systems (NeurIPS), 2023

Hongcheng Gao

Heng Ji

Zhiyuan Liu

Maosong Sun

676

135

07 Jun 2023

Entailment as Robust Self-LearnerAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

242

26 May 2023

On Degrees of Freedom in Defining and Testing Natural Language UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Saku Sugawara

S. Tsugita

ELM

341

24 May 2023

Out-of-Distribution Generalization in Text Classification: Past, Present, and Future

Lingqiao Liu

Yue Zhang

345

23 May 2023

On the Limitations of Simulating Active LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Katerina Margatina

Nikolaos Aletras

292

21 May 2023

Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions?

209

20 May 2023

Multilingual Event Extraction from Historical Newspaper AdvertsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Nadav Borenstein

N. Perez

Isabelle Augenstein

304

18 May 2023

A Matter of Annotation: An Empirical Study on In Situ and Self-Recall Activity Annotations from Wearable Sensors

Alexander Hoelzemann

Kristof Van Laerhoven

153

15 May 2023

Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering ModelsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023

240

11 May 2023

Assessing Language Model Deployment with Risk Cards

Leon Derczynski

Hannah Rose Kirk

Vidhisha Balachandran

359

31 Mar 2023

ScatterShot: Interactive In-context Example Curation for Text TransformationInternational Conference on Intelligent User Interfaces (IUI), 2023

Tongshuang Wu

Hua Shen

Daniel S. Weld

Jeffrey Heer

Marco Tulio Ribeiro

174

14 Feb 2023

Exploring the Benefits of Training Expert Language Models over Instruction TuningInternational Conference on Machine Learning (ICML), 2023

514

07 Feb 2023

Parallel Context Windows for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

421

21 Dec 2022

ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Dheeraj Mekala

Jason Wolfe

Subhro Roy

307

21 Dec 2022

Evaluating Human-Language Model Interaction

Esin Durmus

...

418

121

19 Dec 2022

Discovering Language Model Behaviors with Model-Written EvaluationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

...

Deep Ganguli

405

673

19 Dec 2022

Which Shortcut Solution Do Question Answering Models Prefer to Learn?AAAI Conference on Artificial Intelligence (AAAI), 2022

Kazutoshi Shinoda

Saku Sugawara

Akiko Aizawa

266

29 Nov 2022

RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the QuestionAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Alireza Mohammadshahi

Angela Fan

299

02 Nov 2022

IDK-MRC: Unanswerable Questions for Indonesian Machine Reading ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Rifki Afina Putri

Alice Oh

253

25 Oct 2022

Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting EvidenceConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

435

137

25 Oct 2022

CORE: A Retrieve-then-Edit Framework for Counterfactual Data GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Luke Zettlemoyer

446

10 Oct 2022

State-of-the-art generalisation research in NLP: A taxonomy and reviewNature Machine Intelligence (Nat. Mach. Intell.), 2022

Verna Dankers

...

690

139

06 Oct 2022

Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft PromptConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

330

06 Oct 2022

Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible ScenariosInternational Conference on Computational Linguistics (COLING), 2022

Mana Ashida

Saku Sugawara

243

16 Sep 2022

Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples

Hezekiah J. Branch

Jonathan Rodriguez Cefalu

Jeremy McHugh

Leyla Hujer

Aditya Bahl

Daniel del Castillo Iglesias

Ron Heichman

Ramesh Darwishi

ELM SILM AAML

244

05 Sep 2022

A Survey on Measuring and Mitigating Reasoning Shortcuts in Machine Reading Comprehension

Xanh Ho

Johannes Mario Meissner

Saku Sugawara

Akiko Aizawa

OffRL

275

05 Sep 2022

WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language ModelsNeural Information Processing Systems (NeurIPS), 2022

Gabriel Stanovsky

252

25 Jul 2022