Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1905.10425
Cited By
v1
v2
v3 (latest)
Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
24 May 2019
Nikita Nangia
Samuel R. Bowman
ELM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark"
42 / 42 papers shown
Title
When Does Meaning Backfire? Investigating the Role of AMRs in NLI
Junghyun Min
Xiulin Yang
Shira Wein
LLMSV
278
2
0
17 Jun 2025
TLoRA: Tri-Matrix Low-Rank Adaptation of Large Language Models
Tanvir Islam
AI4CE
326
0
0
25 Apr 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Robert Bamler
588
165
0
10 Apr 2025
Neuro-Symbolic Contrastive Learning for Cross-domain Inference
International Conference on Logic Programming (ICLP), 2025
Mingyue Liu
Ryo Ueda
Zhen Wan
Katsumi Inoue
Chris G. Willcocks
NAI
397
0
0
13 Feb 2025
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu
Aaron Mueller
Candace Ross
Adina Williams
Tal Linzen
Chengxu Zhuang
Robert Bamler
Leshem Choshen
Alex Warstadt
Ethan Gotlieb Wilcox
471
36
0
06 Dec 2024
RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs
Ekaterina Taktasheva
Maxim Bazhukov
Kirill Koncha
Alena Fenogenova
Ekaterina Artemova
Vladislav Mikhailov
304
20
0
27 Jun 2024
What Makes Language Models Good-enough?
Daiki Asami
Saku Sugawara
214
1
0
06 Jun 2024
A synthetic data approach for domain generalization of NLI models
Mohammad Javad Hosseini
Andrey Petrov
Alex Fabrikant
Annie Louis
SyDa
251
16
0
19 Feb 2024
The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific Progress in NLP
Julian Michael
181
1
0
01 Dec 2023
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
International Conference on Machine Learning (ICML), 2023
Lu Yin
Ajay Jaiswal
Shiwei Liu
Souvik Kundu
Zinan Lin
341
7
0
29 Sep 2023
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Lucas Bandarkar
Davis Liang
Benjamin Muller
Mikel Artetxe
Satya Narayan Shukla
Don Husa
Naman Goyal
Abhinandan Krishnan
Luke Zettlemoyer
Madian Khabsa
340
230
0
31 Aug 2023
Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models
International Conference on Learning Representations (ICLR), 2023
Peiyan Zhang
Hao Liu
Chaozhuo Li
Xing Xie
Sunghun Kim
Haohan Wang
VLM
OOD
296
9
0
21 Aug 2023
What's the Meaning of Superhuman Performance in Today's NLU?
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Simone Tedeschi
Johan Bos
T. Declerck
Jan Hajic
Daniel Hershcovich
...
Simon Krek
Steven Schockaert
Rico Sennrich
Ekaterina Shutova
Roberto Navigli
ELM
LM&MA
VLM
ReLM
LRM
284
37
0
15 May 2023
Are Machine Rationales (Not) Useful to Humans? Measuring and Improving Human Utility of Free-Text Rationales
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Brihi Joshi
Ziyi Liu
Sahana Ramnath
Aaron Chan
Zhewei Tong
Shaoliang Nie
Qifan Wang
Yejin Choi
Xiang Ren
HAI
LRM
215
39
0
11 May 2023
A Human Subject Study of Named Entity Recognition (NER) in Conversational Music Recommendation Queries
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Elena V. Epure
Romain Hennequin
123
6
0
13 Mar 2023
A Challenging Benchmark for Low-Resource Learning
Yudong Wang
Chang Ma
Qingxiu Dong
Lingpeng Kong
Jingjing Xu
145
9
0
07 Mar 2023
RuCoLA: Russian Corpus of Linguistic Acceptability
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Vladislav Mikhailov
T. Shamardina
Max Ryabinin
A. Pestova
I. Smurov
Ekaterina Artemova
235
37
0
23 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
Nature Machine Intelligence (Nat. Mach. Intell.), 2022
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Robert Bamler
Zhijing Jin
600
130
0
06 Oct 2022
HumanAL: Calibrating Human Matching Beyond a Single Task
Roee Shraga
HAI
144
6
0
06 May 2022
Testing the limits of natural language models for predicting human language judgments
Nature Machine Intelligence (Nat. Mach. Intell.), 2022
Tal Golan
Matthew Siegelman
N. Kriegeskorte
Christopher A. Baldassano
245
20
0
07 Apr 2022
NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis
International Conference on Language Resources and Evaluation (LREC), 2022
Shamsuddeen Hassan Muhammad
David Ifeoluwa Adelani
Sebastian Ruder
Ibrahim Said Ahmad
Idris Abdulmumin
...
Chris C. Emezue
Saheed Abdul
Anuoluwapo Aremu
Alipio Jeorge
P. Brazdil
296
119
0
20 Jan 2022
The Defeat of the Winograd Schema Challenge
Artificial Intelligence (AIJ), 2022
Vid Kocijan
E. Davis
Thomas Lukasiewicz
G. Marcus
L. Morgenstern
292
48
0
07 Jan 2022
How not to Lie with a Benchmark: Rearranging NLP Leaderboards
Tatiana Shavrina
Valentin Malykh
ALM
ELM
646
14
0
02 Dec 2021
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Wei Ping
Chejian Xu
Shuohang Wang
Zhe Gan
Yu Cheng
Jianfeng Gao
Ahmed Hassan Awadallah
Yangqiu Song
VLM
ELM
AAML
271
271
0
04 Nov 2021
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding
Subhabrata Mukherjee
Xiaodong Liu
Guoqing Zheng
Saghar Hosseini
Hao Cheng
Greg Yang
Christopher Meek
Ahmed Hassan Awadallah
Jianfeng Gao
ELM
156
12
0
04 Nov 2021
IndoNLI: A Natural Language Inference Dataset for Indonesian
Rahmad Mahendra
Alham Fikri Aji
Samuel Louvan
Fahrurrozi Rahman
Clara Vania
189
36
0
27 Oct 2021
Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference
Findings (Findings), 2021
Hai Hu
He Zhou
Zuoyu Tian
Yiwen Zhang
Yina Ma
Yanting Li
Yixin Nie
Kyle Richardson
155
12
0
07 Jun 2021
Comparing Test Sets with Item Response Theory
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Clara Vania
Phu Mon Htut
William Huang
Dhara Mungra
Richard Yuanzhe Pang
Jason Phang
Haokun Liu
Kyunghyun Cho
Sam Bowman
159
50
0
01 Jun 2021
KLUE: Korean Language Understanding Evaluation
Sungjoon Park
Jihyung Moon
Sungdong Kim
Won Ik Cho
Jiyoon Han
...
Seonghyun Kim
Lucy Park
Alice Oh
Jung-Woo Ha
Kyunghyun Cho
ELM
VLM
452
218
0
20 May 2021
Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks
Tatiana Iazykova
Denis Kapelyushnik
Olga Bystrova
Andrey Kutuzov
ELM
124
1
0
03 May 2021
Sensitivity as a Complexity Measure for Sequence Classification Tasks
Transactions of the Association for Computational Linguistics (TACL), 2021
Michael Hahn
Dan Jurafsky
Richard Futrell
308
22
0
21 Apr 2021
What Will it Take to Fix Benchmarking in Natural Language Understanding?
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Samuel R. Bowman
George E. Dahl
ELM
ALM
253
185
0
05 Apr 2021
OCNLI: Original Chinese Natural Language Inference
Hai Hu
Kyle Richardson
Liang Xu
Lu Li
Sandra Kübler
L. Moss
223
129
0
12 Oct 2020
What Can We Learn from Collective Human Opinions on Natural Language Inference Data?
Yixin Nie
Xiang Zhou
Joey Tianyi Zhou
352
158
0
07 Oct 2020
How Can We Accelerate Progress Towards Human-like Linguistic Generalization?
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Tal Linzen
460
204
0
03 May 2020
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
International Conference on Machine Learning (ICML), 2020
Junjie Hu
Sebastian Ruder
Aditya Siddhant
Graham Neubig
Orhan Firat
Melvin Johnson
ELM
619
1,065
0
24 Mar 2020
What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge
Transactions of the Association for Computational Linguistics (TACL), 2019
Kyle Richardson
Ashish Sabharwal
199
47
0
31 Dec 2019
Learning to Learn Words from Visual Scenes
Dídac Surís
Dave Epstein
Heng Ji
Shih-Fu Chang
Carl Vondrick
VLM
CLIP
SSL
OffRL
141
4
0
25 Nov 2019
BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2019
R. Thomas McCoy
Junghyun Min
Tal Linzen
360
156
0
07 Nov 2019
A Pragmatics-Centered Evaluation Framework for Natural Language Understanding
International Conference on Language Resources and Evaluation (LREC), 2019
Damien Sileo
Tim Van de Cruys
Camille Pradel
Philippe Muller
ELM
122
3
0
19 Jul 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Neural Information Processing Systems (NeurIPS), 2019
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
652
2,597
0
02 May 2019
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
R. Thomas McCoy
Ellie Pavlick
Tal Linzen
775
1,327
0
04 Feb 2019
1