ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.10425
  4. Cited By
Human vs. Muppet: A Conservative Estimate of Human Performance on the
  GLUE Benchmark
v1v2v3 (latest)

Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark

Annual Meeting of the Association for Computational Linguistics (ACL), 2019
24 May 2019
Nikita Nangia
Samuel R. Bowman
    ELMALM
ArXiv (abs)PDFHTML

Papers citing "Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark"

42 / 42 papers shown
When Does Meaning Backfire? Investigating the Role of AMRs in NLI
When Does Meaning Backfire? Investigating the Role of AMRs in NLI
Junghyun Min
Xiulin Yang
Shira Wein
LLMSV
290
2
0
17 Jun 2025
TLoRA: Tri-Matrix Low-Rank Adaptation of Large Language Models
TLoRA: Tri-Matrix Low-Rank Adaptation of Large Language Models
Tanvir Islam
AI4CE
334
0
0
25 Apr 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Robert Bamler
600
168
0
10 Apr 2025
Neuro-Symbolic Contrastive Learning for Cross-domain Inference
Neuro-Symbolic Contrastive Learning for Cross-domain InferenceInternational Conference on Logic Programming (ICLP), 2025
Mingyue Liu
Ryo Ueda
Zhen Wan
Katsumi Inoue
Chris G. Willcocks
NAI
422
1
0
13 Feb 2025
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on
  Developmentally Plausible Corpora
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu
Aaron Mueller
Candace Ross
Adina Williams
Tal Linzen
Chengxu Zhuang
Robert Bamler
Leshem Choshen
Alex Warstadt
Ethan Gotlieb Wilcox
490
39
0
06 Dec 2024
RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs
RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs
Ekaterina Taktasheva
Maxim Bazhukov
Kirill Koncha
Alena Fenogenova
Ekaterina Artemova
Vladislav Mikhailov
314
21
0
27 Jun 2024
What Makes Language Models Good-enough?
What Makes Language Models Good-enough?
Daiki Asami
Saku Sugawara
230
2
0
06 Jun 2024
A synthetic data approach for domain generalization of NLI models
A synthetic data approach for domain generalization of NLI models
Mohammad Javad Hosseini
Andrey Petrov
Alex Fabrikant
Annie Louis
SyDa
268
16
0
19 Feb 2024
The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific
  Progress in NLP
The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific Progress in NLP
Julian Michael
200
1
0
01 Dec 2023
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMsInternational Conference on Machine Learning (ICML), 2023
Lu Yin
Ajay Jaiswal
Shiwei Liu
Souvik Kundu
Zinan Lin
383
7
0
29 Sep 2023
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122
  Language Variants
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language VariantsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Lucas Bandarkar
Davis Liang
Benjamin Muller
Mikel Artetxe
Satya Narayan Shukla
Don Husa
Naman Goyal
Abhinandan Krishnan
Luke Zettlemoyer
Madian Khabsa
360
237
0
31 Aug 2023
Foundation Model-oriented Robustness: Robust Image Model Evaluation with
  Pretrained Models
Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained ModelsInternational Conference on Learning Representations (ICLR), 2023
Peiyan Zhang
Hao Liu
Chaozhuo Li
Xing Xie
Sunghun Kim
Haohan Wang
VLMOOD
337
9
0
21 Aug 2023
What's the Meaning of Superhuman Performance in Today's NLU?
What's the Meaning of Superhuman Performance in Today's NLU?Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Simone Tedeschi
Johan Bos
T. Declerck
Jan Hajic
Daniel Hershcovich
...
Simon Krek
Steven Schockaert
Rico Sennrich
Ekaterina Shutova
Roberto Navigli
ELMLM&MAVLMReLMLRM
297
37
0
15 May 2023
Are Machine Rationales (Not) Useful to Humans? Measuring and Improving
  Human Utility of Free-Text Rationales
Are Machine Rationales (Not) Useful to Humans? Measuring and Improving Human Utility of Free-Text RationalesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Brihi Joshi
Ziyi Liu
Sahana Ramnath
Aaron Chan
Zhewei Tong
Shaoliang Nie
Qifan Wang
Yejin Choi
Xiang Ren
HAILRM
218
39
0
11 May 2023
A Human Subject Study of Named Entity Recognition (NER) in
  Conversational Music Recommendation Queries
A Human Subject Study of Named Entity Recognition (NER) in Conversational Music Recommendation QueriesConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Elena V. Epure
Romain Hennequin
136
6
0
13 Mar 2023
A Challenging Benchmark for Low-Resource Learning
A Challenging Benchmark for Low-Resource Learning
Yudong Wang
Chang Ma
Qingxiu Dong
Lingpeng Kong
Jingjing Xu
153
9
0
07 Mar 2023
RuCoLA: Russian Corpus of Linguistic Acceptability
RuCoLA: Russian Corpus of Linguistic AcceptabilityConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Vladislav Mikhailov
T. Shamardina
Max Ryabinin
A. Pestova
I. Smurov
Ekaterina Artemova
258
37
0
23 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and reviewNature Machine Intelligence (Nat. Mach. Intell.), 2022
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Robert Bamler
Zhijing Jin
628
131
0
06 Oct 2022
HumanAL: Calibrating Human Matching Beyond a Single Task
HumanAL: Calibrating Human Matching Beyond a Single Task
Roee Shraga
HAI
156
6
0
06 May 2022
Testing the limits of natural language models for predicting human
  language judgments
Testing the limits of natural language models for predicting human language judgmentsNature Machine Intelligence (Nat. Mach. Intell.), 2022
Tal Golan
Matthew Siegelman
N. Kriegeskorte
Christopher A. Baldassano
258
20
0
07 Apr 2022
NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual
  Sentiment Analysis
NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment AnalysisInternational Conference on Language Resources and Evaluation (LREC), 2022
Shamsuddeen Hassan Muhammad
David Ifeoluwa Adelani
Sebastian Ruder
Ibrahim Said Ahmad
Idris Abdulmumin
...
Chris C. Emezue
Saheed Abdul
Anuoluwapo Aremu
Alipio Jeorge
P. Brazdil
305
119
0
20 Jan 2022
The Defeat of the Winograd Schema Challenge
The Defeat of the Winograd Schema ChallengeArtificial Intelligence (AIJ), 2022
Vid Kocijan
E. Davis
Thomas Lukasiewicz
G. Marcus
L. Morgenstern
312
48
0
07 Jan 2022
How not to Lie with a Benchmark: Rearranging NLP Leaderboards
How not to Lie with a Benchmark: Rearranging NLP Leaderboards
Tatiana Shavrina
Valentin Malykh
ALMELM
659
14
0
02 Dec 2021
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
  Language Models
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Wei Ping
Chejian Xu
Shuohang Wang
Zhe Gan
Yu Cheng
Jianfeng Gao
Ahmed Hassan Awadallah
Yangqiu Song
VLMELMAAML
291
273
0
04 Nov 2021
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding
Subhabrata Mukherjee
Xiaodong Liu
Guoqing Zheng
Saghar Hosseini
Hao Cheng
Greg Yang
Christopher Meek
Ahmed Hassan Awadallah
Jianfeng Gao
ELM
157
12
0
04 Nov 2021
IndoNLI: A Natural Language Inference Dataset for Indonesian
IndoNLI: A Natural Language Inference Dataset for Indonesian
Rahmad Mahendra
Alham Fikri Aji
Samuel Louvan
Fahrurrozi Rahman
Clara Vania
204
36
0
27 Oct 2021
Investigating Transfer Learning in Multilingual Pre-trained Language
  Models through Chinese Natural Language Inference
Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language InferenceFindings (Findings), 2021
Hai Hu
He Zhou
Zuoyu Tian
Yiwen Zhang
Yina Ma
Yanting Li
Yixin Nie
Kyle Richardson
167
12
0
07 Jun 2021
Comparing Test Sets with Item Response Theory
Comparing Test Sets with Item Response TheoryAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Clara Vania
Phu Mon Htut
William Huang
Dhara Mungra
Richard Yuanzhe Pang
Jason Phang
Haokun Liu
Kyunghyun Cho
Sam Bowman
169
50
0
01 Jun 2021
KLUE: Korean Language Understanding Evaluation
KLUE: Korean Language Understanding Evaluation
Sungjoon Park
Jihyung Moon
Sungdong Kim
Won Ik Cho
Jiyoon Han
...
Seonghyun Kim
Lucy Park
Alice Oh
Jung-Woo Ha
Kyunghyun Cho
ELMVLM
469
220
0
20 May 2021
Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian
  SuperGLUE Tasks
Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks
Tatiana Iazykova
Denis Kapelyushnik
Olga Bystrova
Andrey Kutuzov
ELM
138
1
0
03 May 2021
Sensitivity as a Complexity Measure for Sequence Classification Tasks
Sensitivity as a Complexity Measure for Sequence Classification TasksTransactions of the Association for Computational Linguistics (TACL), 2021
Michael Hahn
Dan Jurafsky
Richard Futrell
322
24
0
21 Apr 2021
What Will it Take to Fix Benchmarking in Natural Language Understanding?
What Will it Take to Fix Benchmarking in Natural Language Understanding?North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Samuel R. Bowman
George E. Dahl
ELMALM
268
188
0
05 Apr 2021
OCNLI: Original Chinese Natural Language Inference
OCNLI: Original Chinese Natural Language Inference
Hai Hu
Kyle Richardson
Liang Xu
Lu Li
Sandra Kübler
L. Moss
227
129
0
12 Oct 2020
What Can We Learn from Collective Human Opinions on Natural Language
  Inference Data?
What Can We Learn from Collective Human Opinions on Natural Language Inference Data?
Yixin Nie
Xiang Zhou
Joey Tianyi Zhou
358
158
0
07 Oct 2020
How Can We Accelerate Progress Towards Human-like Linguistic
  Generalization?
How Can We Accelerate Progress Towards Human-like Linguistic Generalization?Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Tal Linzen
480
205
0
03 May 2020
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
  Cross-lingual Generalization
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual GeneralizationInternational Conference on Machine Learning (ICML), 2020
Junjie Hu
Sebastian Ruder
Aditya Siddhant
Graham Neubig
Orhan Firat
Melvin Johnson
ELM
639
1,068
0
24 Mar 2020
What Does My QA Model Know? Devising Controlled Probes using Expert
  Knowledge
What Does My QA Model Know? Devising Controlled Probes using Expert KnowledgeTransactions of the Association for Computational Linguistics (TACL), 2019
Kyle Richardson
Ashish Sabharwal
243
47
0
31 Dec 2019
Learning to Learn Words from Visual Scenes
Learning to Learn Words from Visual Scenes
Dídac Surís
Dave Epstein
Heng Ji
Shih-Fu Chang
Carl Vondrick
VLMCLIPSSLOffRL
186
4
0
25 Nov 2019
BERTs of a feather do not generalize together: Large variability in
  generalization across models with similar test set performance
BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performanceBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2019
R. Thomas McCoy
Junghyun Min
Tal Linzen
401
156
0
07 Nov 2019
A Pragmatics-Centered Evaluation Framework for Natural Language
  Understanding
A Pragmatics-Centered Evaluation Framework for Natural Language UnderstandingInternational Conference on Language Resources and Evaluation (LREC), 2019
Damien Sileo
Tim Van de Cruys
Camille Pradel
Philippe Muller
ELM
134
3
0
19 Jul 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language
  Understanding Systems
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding SystemsNeural Information Processing Systems (NeurIPS), 2019
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
663
2,610
0
02 May 2019
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural
  Language Inference
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
R. Thomas McCoy
Ellie Pavlick
Tal Linzen
936
1,329
0
04 Feb 2019
1