ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.00293
  4. Cited By
Beat the AI: Investigating Adversarial Human Annotation for Reading
  Comprehension
v1v2 (latest)

Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension

Transactions of the Association for Computational Linguistics (TACL), 2020
2 February 2020
Max Bartolo
A. Roberts
Johannes Welbl
Sebastian Riedel
Pontus Stenetorp
    AAML
ArXiv (abs)PDFHTML

Papers citing "Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension"

50 / 91 papers shown
Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering
Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering
Sai Shridhar Balamurali
Lu Cheng
165
1
0
10 Nov 2025
ARC-Encoder: learning compressed text representations for large language models
ARC-Encoder: learning compressed text representations for large language models
Hippolyte Pilchen
Edouard Grave
P. Pérez
LLMAGRALMAI4CE
191
1
0
23 Oct 2025
GRADE: Generating multi-hop QA and fine-gRAined Difficulty matrix for RAG Evaluation
GRADE: Generating multi-hop QA and fine-gRAined Difficulty matrix for RAG Evaluation
Jeongsoo Lee
Daeyong Kwon
Kyohoon Jin
134
2
0
23 Aug 2025
AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs
AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs
Xiaopeng Ke
Hexuan Deng
Xuebo Liu
Jun Rao
Zhenxi Song
Jun-chen Yu
Min Zhang
SyDa
254
2
0
24 Jul 2025
Hatevolution: What Static Benchmarks Don't Tell Us
Hatevolution: What Static Benchmarks Don't Tell UsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Chiara Di Bonaventura
Barbara McGillivray
Yulan He
Albert Meroño-Peñuela
239
0
0
13 Jun 2025
Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models
Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Boheng Sheng
Jiacheng Yao
Meicong Zhang
Guoxiu He
RALM
267
7
0
01 Jun 2025
CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation
CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation
Elahe Khatibi
Ziyu Wang
Amir M. Rahmani
253
4
0
17 Apr 2025
Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension
Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension
Yulong Wu
Viktor Schlegel
Riza Batista-Navarro
AAML
470
2
0
23 Feb 2025
Enhancing Financial Fraud Detection with Human-in-the-Loop Feedback and
  Feedback Propagation
Enhancing Financial Fraud Detection with Human-in-the-Loop Feedback and Feedback PropagationInternational Conference on Machine Learning and Applications (ICMLA), 2024
Prashank Kadam
229
4
0
07 Nov 2024
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Shashank Yadav
Rohan Tomar
Garvit Jain
Chirag Ahooja
Shubham Chaudhary
Charles Elkan
342
1
0
05 Oct 2024
Data Contamination Report from the 2024 CONDA Shared Task
Data Contamination Report from the 2024 CONDA Shared Task
Oscar Sainz
Iker García-Ferrero
Alon Jacovi
Jonas Hanselle
Yanai Elazar
...
Yu-Min Tseng
Vishaal Udandarao
Zengzhi Wang
Ruijie Xu
Jinglin Yang
320
17
0
31 Jul 2024
A New Benchmark Dataset and Mixture-of-Experts Language Models for Adversarial Natural Language Inference in Vietnamese
A New Benchmark Dataset and Mixture-of-Experts Language Models for Adversarial Natural Language Inference in Vietnamese
Tin Van Huynh
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
384
2
0
25 Jun 2024
Generative AI for Synthetic Data Generation: Methods, Challenges and the
  Future
Generative AI for Synthetic Data Generation: Methods, Challenges and the Future
Xu Guo
Yiqiang Chen
SyDa
219
61
0
07 Mar 2024
Desiderata for the Context Use of Question Answering Systems
Desiderata for the Context Use of Question Answering Systems
Sagi Shaier
Lawrence E Hunter
Katharina von der Wense
380
6
0
31 Jan 2024
How the Advent of Ubiquitous Large Language Models both Stymie and
  Turbocharge Dynamic Adversarial Question Generation
How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation
Yoo Yeon Sung
Ishani Mondal
Jordan L. Boyd-Graber
283
1
0
20 Jan 2024
Elo Uncovered: Robustness and Best Practices in Language Model
  Evaluation
Elo Uncovered: Robustness and Best Practices in Language Model EvaluationIEEE Games Entertainment Media Conference (IEEE GEM), 2023
M. Boubdir
Edward Kim
Beyza Ermis
Sara Hooker
Marzieh Fadaee
ELM
267
73
0
29 Nov 2023
Break it, Imitate it, Fix it: Robustness by Generating Human-Like
  Attacks
Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks
Aradhana Sinha
Ananth Balashankar
Ahmad Beirami
Thi Avrahami
Jilin Chen
Alex Beutel
AAML
292
6
0
25 Oct 2023
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large
  Language Models by Extrapolating Errors from Small Models
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Ruida Wang
Wangchunshu Zhou
Mrinmaya Sachan
276
39
0
20 Oct 2023
Mind the instructions: a holistic evaluation of consistency and
  interactions in prompt-based learning
Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning
Lucas Weber
Elia Bruni
Dieuwke Hupkes
302
37
0
20 Oct 2023
Pseudointelligence: A Unifying Framework for Language Model Evaluation
Pseudointelligence: A Unifying Framework for Language Model Evaluation
Shikhar Murty
Orr Paradise
Pratyusha Sharma
179
0
0
18 Oct 2023
No Offense Taken: Eliciting Offensiveness from Language Models
No Offense Taken: Eliciting Offensiveness from Language Models
Anugya Srivastava
Rahul Ahuja
Rohith Mukku
276
5
0
02 Oct 2023
Teaching Smaller Language Models To Generalise To Unseen Compositional
  Questions
Teaching Smaller Language Models To Generalise To Unseen Compositional Questions
Tim Hartill
N. Tan
Michael Witbrock
Patricia J. Riddle
ReLMKELMLRM
287
4
0
02 Aug 2023
Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
Text Alignment Is An Efficient Unified Model for Massive NLP TasksNeural Information Processing Systems (NeurIPS), 2023
Yuheng Zha
Yichi Yang
Ruichen Li
Zhiting Hu
ALM
371
16
0
06 Jul 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
  and LLMs Evaluations
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs EvaluationsNeural Information Processing Systems (NeurIPS), 2023
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
676
135
0
07 Jun 2023
Entailment as Robust Self-Learner
Entailment as Robust Self-LearnerAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jiaxin Ge
Hongyin Luo
Yoon Kim
James R. Glass
242
3
0
26 May 2023
On Degrees of Freedom in Defining and Testing Natural Language
  Understanding
On Degrees of Freedom in Defining and Testing Natural Language UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Saku Sugawara
S. Tsugita
ELM
341
2
0
24 May 2023
Out-of-Distribution Generalization in Text Classification: Past,
  Present, and Future
Out-of-Distribution Generalization in Text Classification: Past, Present, and Future
Linyi Yang
Yangqiu Song
Xuan Ren
Chenyang Lyu
Yidong Wang
Lingqiao Liu
Yongfeng Zhang
Jennifer Foster
Yue Zhang
OOD
345
3
0
23 May 2023
On the Limitations of Simulating Active Learning
On the Limitations of Simulating Active LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Katerina Margatina
Nikolaos Aletras
292
14
0
21 May 2023
Can NLP Models Correctly Reason Over Contexts that Break the Common
  Assumptions?
Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions?
Neeraj Varshney
Mihir Parmar
Nisarg Patel
Divij Handa
Sayantan Sarkar
Man Luo
Chitta Baral
LRM
209
5
0
20 May 2023
Multilingual Event Extraction from Historical Newspaper Adverts
Multilingual Event Extraction from Historical Newspaper AdvertsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Nadav Borenstein
N. Perez
Isabelle Augenstein
304
6
0
18 May 2023
A Matter of Annotation: An Empirical Study on In Situ and Self-Recall
  Activity Annotations from Wearable Sensors
A Matter of Annotation: An Empirical Study on In Situ and Self-Recall Activity Annotations from Wearable Sensors
Alexander Hoelzemann
Kristof Van Laerhoven
153
12
0
15 May 2023
Think Twice: Measuring the Efficiency of Eliminating Prediction
  Shortcuts of Question Answering Models
Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering ModelsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Lukávs Mikula
Michal vStefánik
Marek Petrovivc
Petr Sojka
240
6
0
11 May 2023
Assessing Language Model Deployment with Risk Cards
Assessing Language Model Deployment with Risk Cards
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
359
49
0
31 Mar 2023
ScatterShot: Interactive In-context Example Curation for Text
  Transformation
ScatterShot: Interactive In-context Example Curation for Text TransformationInternational Conference on Intelligent User Interfaces (IUI), 2023
Tongshuang Wu
Hua Shen
Daniel S. Weld
Jeffrey Heer
Marco Tulio Ribeiro
174
37
0
14 Feb 2023
Exploring the Benefits of Training Expert Language Models over
  Instruction Tuning
Exploring the Benefits of Training Expert Language Models over Instruction TuningInternational Conference on Machine Learning (ICML), 2023
Joel Jang
Seungone Kim
Seonghyeon Ye
Doyoung Kim
Lajanugen Logeswaran
Moontae Lee
Kyungjae Lee
Minjoon Seo
LRMALM
514
95
0
07 Feb 2023
Parallel Context Windows for Large Language Models
Parallel Context Windows for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Nir Ratner
Yoav Levine
Yonatan Belinkov
Ori Ram
Inbal Magar
Omri Abend
Ehud D. Karpas
Amnon Shashua
Kevin Leyton-Brown
Y. Shoham
RALM
421
94
0
21 Dec 2022
ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language
  Models
ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Dheeraj Mekala
Jason Wolfe
Subhro Roy
307
9
0
21 Dec 2022
Evaluating Human-Language Model Interaction
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Abigail Z. Jacobs
LM&MAALM
418
121
0
19 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Discovering Language Model Behaviors with Model-Written EvaluationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
405
673
0
19 Dec 2022
Which Shortcut Solution Do Question Answering Models Prefer to Learn?
Which Shortcut Solution Do Question Answering Models Prefer to Learn?AAAI Conference on Artificial Intelligence (AAAI), 2022
Kazutoshi Shinoda
Saku Sugawara
Akiko Aizawa
266
9
0
29 Nov 2022
RQUGE: Reference-Free Metric for Evaluating Question Generation by
  Answering the Question
RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the QuestionAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Alireza Mohammadshahi
Thomas Scialom
Majid Yazdani
Pouya Yanki
Angela Fan
James Henderson
Marzieh Saeidi
299
24
0
02 Nov 2022
IDK-MRC: Unanswerable Questions for Indonesian Machine Reading
  Comprehension
IDK-MRC: Unanswerable Questions for Indonesian Machine Reading ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Rifki Afina Putri
Alice Oh
253
13
0
25 Oct 2022
Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating
  Models to Reflect Conflicting Evidence
Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting EvidenceConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Hung-Ting Chen
Michael J.Q. Zhang
Eunsol Choi
RALMHILM
435
137
0
25 Oct 2022
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
CORE: A Retrieve-then-Edit Framework for Counterfactual Data GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Tanay Dixit
Bhargavi Paranjape
Hannaneh Hajishirzi
Luke Zettlemoyer
SyDa
446
34
0
10 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and reviewNature Machine Intelligence (Nat. Mach. Intell.), 2022
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Robert Bamler
Zhijing Jin
690
139
0
06 Oct 2022
Efficiently Enhancing Zero-Shot Performance of Instruction Following
  Model via Retrieval of Soft Prompt
Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft PromptConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Seonghyeon Ye
Joel Jang
Doyoung Kim
Yongrae Jo
Minjoon Seo
VLM
330
3
0
06 Oct 2022
Possible Stories: Evaluating Situated Commonsense Reasoning under
  Multiple Possible Scenarios
Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible ScenariosInternational Conference on Computational Linguistics (COLING), 2022
Mana Ashida
Saku Sugawara
243
6
0
16 Sep 2022
Evaluating the Susceptibility of Pre-Trained Language Models via
  Handcrafted Adversarial Examples
Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples
Hezekiah J. Branch
Jonathan Rodriguez Cefalu
Jeremy McHugh
Leyla Hujer
Aditya Bahl
Daniel del Castillo Iglesias
Ron Heichman
Ramesh Darwishi
ELMSILMAAML
244
75
0
05 Sep 2022
A Survey on Measuring and Mitigating Reasoning Shortcuts in Machine
  Reading Comprehension
A Survey on Measuring and Mitigating Reasoning Shortcuts in Machine Reading Comprehension
Xanh Ho
Johannes Mario Meissner
Saku Sugawara
Akiko Aizawa
OffRL
275
6
0
05 Sep 2022
WinoGAViL: Gamified Association Benchmark to Challenge
  Vision-and-Language Models
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Yonatan Bitton
Nitzan Bitton-Guetta
Ron Yosef
Yuval Elovici
Joey Tianyi Zhou
Gabriel Stanovsky
Roy Schwartz
252
19
0
25 Jul 2022
12
Next
Page 1 of 2