Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.00293
Cited By
v1
v2 (latest)
Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension
Transactions of the Association for Computational Linguistics (TACL), 2020
2 February 2020
Max Bartolo
A. Roberts
Johannes Welbl
Sebastian Riedel
Pontus Stenetorp
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension"
50 / 91 papers shown
Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering
Sai Shridhar Balamurali
Lu Cheng
165
1
0
10 Nov 2025
ARC-Encoder: learning compressed text representations for large language models
Hippolyte Pilchen
Edouard Grave
P. Pérez
LLMAG
RALM
AI4CE
191
1
0
23 Oct 2025
GRADE: Generating multi-hop QA and fine-gRAined Difficulty matrix for RAG Evaluation
Jeongsoo Lee
Daeyong Kwon
Kyohoon Jin
134
2
0
23 Aug 2025
AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs
Xiaopeng Ke
Hexuan Deng
Xuebo Liu
Jun Rao
Zhenxi Song
Jun-chen Yu
Min Zhang
SyDa
254
2
0
24 Jul 2025
Hatevolution: What Static Benchmarks Don't Tell Us
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Chiara Di Bonaventura
Barbara McGillivray
Yulan He
Albert Meroño-Peñuela
239
0
0
13 Jun 2025
Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Boheng Sheng
Jiacheng Yao
Meicong Zhang
Guoxiu He
RALM
267
7
0
01 Jun 2025
CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation
Elahe Khatibi
Ziyu Wang
Amir M. Rahmani
253
4
0
17 Apr 2025
Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension
Yulong Wu
Viktor Schlegel
Riza Batista-Navarro
AAML
470
2
0
23 Feb 2025
Enhancing Financial Fraud Detection with Human-in-the-Loop Feedback and Feedback Propagation
International Conference on Machine Learning and Applications (ICMLA), 2024
Prashank Kadam
229
4
0
07 Nov 2024
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Shashank Yadav
Rohan Tomar
Garvit Jain
Chirag Ahooja
Shubham Chaudhary
Charles Elkan
342
1
0
05 Oct 2024
Data Contamination Report from the 2024 CONDA Shared Task
Oscar Sainz
Iker García-Ferrero
Alon Jacovi
Jonas Hanselle
Yanai Elazar
...
Yu-Min Tseng
Vishaal Udandarao
Zengzhi Wang
Ruijie Xu
Jinglin Yang
320
17
0
31 Jul 2024
A New Benchmark Dataset and Mixture-of-Experts Language Models for Adversarial Natural Language Inference in Vietnamese
Tin Van Huynh
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
384
2
0
25 Jun 2024
Generative AI for Synthetic Data Generation: Methods, Challenges and the Future
Xu Guo
Yiqiang Chen
SyDa
219
61
0
07 Mar 2024
Desiderata for the Context Use of Question Answering Systems
Sagi Shaier
Lawrence E Hunter
Katharina von der Wense
380
6
0
31 Jan 2024
How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation
Yoo Yeon Sung
Ishani Mondal
Jordan L. Boyd-Graber
283
1
0
20 Jan 2024
Elo Uncovered: Robustness and Best Practices in Language Model Evaluation
IEEE Games Entertainment Media Conference (IEEE GEM), 2023
M. Boubdir
Edward Kim
Beyza Ermis
Sara Hooker
Marzieh Fadaee
ELM
267
73
0
29 Nov 2023
Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks
Aradhana Sinha
Ananth Balashankar
Ahmad Beirami
Thi Avrahami
Jilin Chen
Alex Beutel
AAML
292
6
0
25 Oct 2023
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Ruida Wang
Wangchunshu Zhou
Mrinmaya Sachan
276
39
0
20 Oct 2023
Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning
Lucas Weber
Elia Bruni
Dieuwke Hupkes
302
37
0
20 Oct 2023
Pseudointelligence: A Unifying Framework for Language Model Evaluation
Shikhar Murty
Orr Paradise
Pratyusha Sharma
179
0
0
18 Oct 2023
No Offense Taken: Eliciting Offensiveness from Language Models
Anugya Srivastava
Rahul Ahuja
Rohith Mukku
276
5
0
02 Oct 2023
Teaching Smaller Language Models To Generalise To Unseen Compositional Questions
Tim Hartill
N. Tan
Michael Witbrock
Patricia J. Riddle
ReLM
KELM
LRM
287
4
0
02 Aug 2023
Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
Neural Information Processing Systems (NeurIPS), 2023
Yuheng Zha
Yichi Yang
Ruichen Li
Zhiting Hu
ALM
371
16
0
06 Jul 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
Neural Information Processing Systems (NeurIPS), 2023
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
676
135
0
07 Jun 2023
Entailment as Robust Self-Learner
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jiaxin Ge
Hongyin Luo
Yoon Kim
James R. Glass
242
3
0
26 May 2023
On Degrees of Freedom in Defining and Testing Natural Language Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Saku Sugawara
S. Tsugita
ELM
341
2
0
24 May 2023
Out-of-Distribution Generalization in Text Classification: Past, Present, and Future
Linyi Yang
Yangqiu Song
Xuan Ren
Chenyang Lyu
Yidong Wang
Lingqiao Liu
Yongfeng Zhang
Jennifer Foster
Yue Zhang
OOD
345
3
0
23 May 2023
On the Limitations of Simulating Active Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Katerina Margatina
Nikolaos Aletras
292
14
0
21 May 2023
Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions?
Neeraj Varshney
Mihir Parmar
Nisarg Patel
Divij Handa
Sayantan Sarkar
Man Luo
Chitta Baral
LRM
209
5
0
20 May 2023
Multilingual Event Extraction from Historical Newspaper Adverts
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Nadav Borenstein
N. Perez
Isabelle Augenstein
304
6
0
18 May 2023
A Matter of Annotation: An Empirical Study on In Situ and Self-Recall Activity Annotations from Wearable Sensors
Alexander Hoelzemann
Kristof Van Laerhoven
153
12
0
15 May 2023
Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering Models
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Lukávs Mikula
Michal vStefánik
Marek Petrovivc
Petr Sojka
240
6
0
11 May 2023
Assessing Language Model Deployment with Risk Cards
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
359
49
0
31 Mar 2023
ScatterShot: Interactive In-context Example Curation for Text Transformation
International Conference on Intelligent User Interfaces (IUI), 2023
Tongshuang Wu
Hua Shen
Daniel S. Weld
Jeffrey Heer
Marco Tulio Ribeiro
174
37
0
14 Feb 2023
Exploring the Benefits of Training Expert Language Models over Instruction Tuning
International Conference on Machine Learning (ICML), 2023
Joel Jang
Seungone Kim
Seonghyeon Ye
Doyoung Kim
Lajanugen Logeswaran
Moontae Lee
Kyungjae Lee
Minjoon Seo
LRM
ALM
514
95
0
07 Feb 2023
Parallel Context Windows for Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Nir Ratner
Yoav Levine
Yonatan Belinkov
Ori Ram
Inbal Magar
Omri Abend
Ehud D. Karpas
Amnon Shashua
Kevin Leyton-Brown
Y. Shoham
RALM
421
94
0
21 Dec 2022
ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Dheeraj Mekala
Jason Wolfe
Subhro Roy
307
9
0
21 Dec 2022
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Abigail Z. Jacobs
LM&MA
ALM
418
121
0
19 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
405
673
0
19 Dec 2022
Which Shortcut Solution Do Question Answering Models Prefer to Learn?
AAAI Conference on Artificial Intelligence (AAAI), 2022
Kazutoshi Shinoda
Saku Sugawara
Akiko Aizawa
266
9
0
29 Nov 2022
RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Alireza Mohammadshahi
Thomas Scialom
Majid Yazdani
Pouya Yanki
Angela Fan
James Henderson
Marzieh Saeidi
299
24
0
02 Nov 2022
IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Rifki Afina Putri
Alice Oh
253
13
0
25 Oct 2022
Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Hung-Ting Chen
Michael J.Q. Zhang
Eunsol Choi
RALM
HILM
435
137
0
25 Oct 2022
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Tanay Dixit
Bhargavi Paranjape
Hannaneh Hajishirzi
Luke Zettlemoyer
SyDa
446
34
0
10 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
Nature Machine Intelligence (Nat. Mach. Intell.), 2022
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Robert Bamler
Zhijing Jin
690
139
0
06 Oct 2022
Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Seonghyeon Ye
Joel Jang
Doyoung Kim
Yongrae Jo
Minjoon Seo
VLM
330
3
0
06 Oct 2022
Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible Scenarios
International Conference on Computational Linguistics (COLING), 2022
Mana Ashida
Saku Sugawara
243
6
0
16 Sep 2022
Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples
Hezekiah J. Branch
Jonathan Rodriguez Cefalu
Jeremy McHugh
Leyla Hujer
Aditya Bahl
Daniel del Castillo Iglesias
Ron Heichman
Ramesh Darwishi
ELM
SILM
AAML
244
75
0
05 Sep 2022
A Survey on Measuring and Mitigating Reasoning Shortcuts in Machine Reading Comprehension
Xanh Ho
Johannes Mario Meissner
Saku Sugawara
Akiko Aizawa
OffRL
275
6
0
05 Sep 2022
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
Neural Information Processing Systems (NeurIPS), 2022
Yonatan Bitton
Nitzan Bitton-Guetta
Ron Yosef
Yuval Elovici
Joey Tianyi Zhou
Gabriel Stanovsky
Roy Schwartz
252
19
0
25 Jul 2022
1
2
Next
Page 1 of 2