Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.06644
Cited By
Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little
14 April 2021
Koustuv Sinha
Robin Jia
Dieuwke Hupkes
J. Pineau
Adina Williams
Douwe Kiela
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little"
50 / 165 papers shown
Title
EigenNoise: A Contrastive Prior to Warm-Start Representations
H. Heidenreich
Jake Williams
11
1
0
09 May 2022
Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text Correspondence
Myeongjun Jang
Frank Mtumbuka
Thomas Lukasiewicz
23
9
0
08 May 2022
To Know by the Company Words Keep and What Else Lies in the Vicinity
Jake Williams
H. Heidenreich
16
0
0
30 Apr 2022
Do Transformer Models Show Similar Attention Patterns to Task-Specific Human Gaze?
Stephanie Brandl
Oliver Eberle
Jonas Pilot
Anders Søgaard
65
33
0
25 Apr 2022
Probing for the Usage of Grammatical Number
Karim Lasri
Tiago Pimentel
Alessandro Lenci
Thierry Poibeau
Ryan Cotterell
25
55
0
19 Apr 2022
On the Role of Pre-trained Language Models in Word Ordering: A Case Study with BART
Zebin Ou
Meishan Zhang
Yue Zhang
19
2
0
15 Apr 2022
Self-Supervised Losses for One-Class Textual Anomaly Detection
Kimberly T. Mai
Toby O. Davies
Lewis D. Griffin
8
7
0
12 Apr 2022
Are We Really Making Much Progress in Text Classification? A Comparative Review
Lukas Galke
Andor Diera
Bao Xin Lin
Bhakti Khera
Tim Meuser
Tushar Singhal
Fabian Karl
A. Scherp
VLM
24
3
0
08 Apr 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Tristan Thrush
Ryan Jiang
Max Bartolo
Amanpreet Singh
Adina Williams
Douwe Kiela
Candace Ross
CoGe
17
400
0
07 Apr 2022
An Exploratory Study on Code Attention in BERT
Rishab Sharma
Fuxiang Chen
Fatemeh H. Fard
David Lo
19
25
0
05 Apr 2022
Transformer Language Models without Positional Encodings Still Learn Positional Information
Adi Haviv
Ori Ram
Ofir Press
Peter Izsak
Omer Levy
20
112
0
30 Mar 2022
Word Order Does Matter (And Shuffled Language Models Know It)
Vinit Ravishankar
Mostafa Abdou
Artur Kulmizev
Anders Søgaard
17
44
0
21 Mar 2022
How does the pre-training objective affect what large language models learn about linguistic properties?
Ahmed Alajrami
Nikolaos Aletras
21
20
0
20 Mar 2022
Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models
Ryokan Ri
Yoshimasa Tsuruoka
21
25
0
19 Mar 2022
Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure
Yuan Chai
Yaobo Liang
Nan Duan
LRM
27
21
0
16 Mar 2022
When classifying grammatical role, BERT doesn't care about word order... except when it matters
Isabel Papadimitriou
Richard Futrell
Kyle Mahowald
MILM
22
29
0
11 Mar 2022
Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies
Zhengxuan Wu
Alex Tamkin
Isabel Papadimitriou
21
9
0
24 Feb 2022
Should You Mask 15% in Masked Language Modeling?
Alexander Wettig
Tianyu Gao
Zexuan Zhong
Danqi Chen
CVBM
29
161
0
16 Feb 2022
Impact of Pretraining Term Frequencies on Few-Shot Reasoning
Yasaman Razeghi
Robert L Logan IV
Matt Gardner
Sameer Singh
ReLM
LRM
17
150
0
15 Feb 2022
Grammatical cues to subjecthood are redundant in a majority of simple clauses across languages
Kyle Mahowald
Evgeniia Diachek
E. Gibson
Evelina Fedorenko
Richard Futrell
14
10
0
30 Jan 2022
How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets
Aarne Talman
Marianna Apidianaki
S. Chatzikyriakidis
Jörg Tiedemann
ELM
19
0
0
12 Jan 2022
Toxicity Detection for Indic Multilingual Social Media Content
Manan A. Jhaveri
Devanshu Ramaiya
Harveen Singh Chadha
23
3
0
03 Jan 2022
A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision
Ajinkya Tejankar
Maziar Sanjabi
Bichen Wu
Saining Xie
Madian Khabsa
Hamed Pirsiavash
Hamed Firooz
VLM
21
17
0
27 Dec 2021
The King is Naked: on the Notion of Robustness for Natural Language Processing
Emanuele La Malfa
Marta Z. Kwiatkowska
20
28
0
13 Dec 2021
Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity
Kurt Shuster
Jack Urbanek
Arthur Szlam
Jason Weston
HILM
13
24
0
10 Dec 2021
Variation and generality in encoding of syntactic anomaly information in sentence embeddings
Qinxuan Wu
Allyson Ettinger
23
2
0
12 Nov 2021
Schrödinger's Tree -- On Syntax and Neural Language Models
Artur Kulmizev
Joakim Nivre
30
6
0
17 Oct 2021
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
29
5
0
16 Oct 2021
Causal Transformers Perform Below Chance on Recursive Nested Constructions, Unlike Humans
Yair Lakretz
T. Desbordes
Dieuwke Hupkes
S. Dehaene
230
11
0
14 Oct 2021
On a Benefit of Mask Language Modeling: Robustness to Simplicity Bias
Ting-Rui Chiang
24
3
0
11 Oct 2021
How BPE Affects Memorization in Transformers
Eugene Kharitonov
Marco Baroni
Dieuwke Hupkes
161
32
0
06 Oct 2021
Compositional generalization in semantic parsing with pretrained transformers
A. Orhan
20
6
0
30 Sep 2021
Structural Persistence in Language Models: Priming as a Window into Abstract Language Representations
Arabella J. Sinclair
Jaap Jumelet
Willem H. Zuidema
Raquel Fernández
56
38
0
30 Sep 2021
Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets
Changchang Zeng
Shaobo Li
16
6
0
29 Sep 2021
Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations
Ekaterina Taktasheva
Vladislav Mikhailov
Ekaterina Artemova
8
13
0
28 Sep 2021
Pragmatic competence of pre-trained language models through the lens of discourse connectives
Lalchand Pandia
Yan Cong
Allyson Ettinger
9
25
0
27 Sep 2021
Training Dynamic based data filtering may not work for NLP datasets
Arka Talukdar
Monika Dagar
Prachi Gupta
Varun G. Menon
NoLa
35
3
0
19 Sep 2021
Numerical reasoning in machine reading comprehension tasks: are we there yet?
Hadeel Al-Negheimish
Pranava Madhyastha
A. Russo
AIMat
ReLM
14
13
0
16 Sep 2021
The Impact of Positional Encodings on Multilingual Compression
Vinit Ravishankar
Anders Søgaard
17
5
0
11 Sep 2021
Does Pretraining for Summarization Require Knowledge Transfer?
Kundan Krishna
Jeffrey P. Bigham
Zachary Chase Lipton
14
35
0
10 Sep 2021
Studying word order through iterative shuffling
Nikolay Malkin
Sameera Lanka
Pranav Goel
Nebojsa Jojic
21
14
0
10 Sep 2021
On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets
Cheng-Han Chiang
Hung-yi Lee
SyDa
24
24
0
08 Sep 2021
BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks
T. Lai
Heng Ji
ChengXiang Zhai
19
31
0
06 Sep 2021
How Does Adversarial Fine-Tuning Benefit BERT?
J. Ebrahimi
Hao Yang
Wei Zhang
AAML
18
4
0
31 Aug 2021
Local Structure Matters Most: Perturbation Study in NLU
Louis Clouâtre
Prasanna Parthasarathi
Amal Zouaq
Sarath Chandar
22
13
0
29 Jul 2021
Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning
Colin Wei
Sang Michael Xie
Tengyu Ma
22
96
0
17 Jun 2021
Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models
Matthew Finlayson
Aaron Mueller
Sebastian Gehrmann
Stuart M. Shieber
Tal Linzen
Yonatan Belinkov
27
101
0
10 Jun 2021
Exploring Unsupervised Pretraining Objectives for Machine Translation
Christos Baziotis
Ivan Titov
Alexandra Birch
Barry Haddow
AAML
AI4CE
18
7
0
10 Jun 2021
Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference
Hai Hu
He Zhou
Zuoyu Tian
Yiwen Zhang
Yina Ma
Yanting Li
Yixin Nie
Kyle Richardson
19
11
0
07 Jun 2021
The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
Ulme Wennberg
G. Henter
MILM
19
21
0
03 Jun 2021
Previous
1
2
3
4
Next