ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.06644
  4. Cited By
Masked Language Modeling and the Distributional Hypothesis: Order Word
  Matters Pre-training for Little

Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

14 April 2021
Koustuv Sinha
Robin Jia
Dieuwke Hupkes
J. Pineau
Adina Williams
Douwe Kiela
ArXivPDFHTML

Papers citing "Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little"

50 / 165 papers shown
Title
EigenNoise: A Contrastive Prior to Warm-Start Representations
EigenNoise: A Contrastive Prior to Warm-Start Representations
H. Heidenreich
Jake Williams
11
1
0
09 May 2022
Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text
  Correspondence
Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text Correspondence
Myeongjun Jang
Frank Mtumbuka
Thomas Lukasiewicz
23
9
0
08 May 2022
To Know by the Company Words Keep and What Else Lies in the Vicinity
To Know by the Company Words Keep and What Else Lies in the Vicinity
Jake Williams
H. Heidenreich
16
0
0
30 Apr 2022
Do Transformer Models Show Similar Attention Patterns to Task-Specific
  Human Gaze?
Do Transformer Models Show Similar Attention Patterns to Task-Specific Human Gaze?
Stephanie Brandl
Oliver Eberle
Jonas Pilot
Anders Søgaard
65
33
0
25 Apr 2022
Probing for the Usage of Grammatical Number
Probing for the Usage of Grammatical Number
Karim Lasri
Tiago Pimentel
Alessandro Lenci
Thierry Poibeau
Ryan Cotterell
25
55
0
19 Apr 2022
On the Role of Pre-trained Language Models in Word Ordering: A Case
  Study with BART
On the Role of Pre-trained Language Models in Word Ordering: A Case Study with BART
Zebin Ou
Meishan Zhang
Yue Zhang
19
2
0
15 Apr 2022
Self-Supervised Losses for One-Class Textual Anomaly Detection
Self-Supervised Losses for One-Class Textual Anomaly Detection
Kimberly T. Mai
Toby O. Davies
Lewis D. Griffin
8
7
0
12 Apr 2022
Are We Really Making Much Progress in Text Classification? A Comparative
  Review
Are We Really Making Much Progress in Text Classification? A Comparative Review
Lukas Galke
Andor Diera
Bao Xin Lin
Bhakti Khera
Tim Meuser
Tushar Singhal
Fabian Karl
A. Scherp
VLM
24
3
0
08 Apr 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic
  Compositionality
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Tristan Thrush
Ryan Jiang
Max Bartolo
Amanpreet Singh
Adina Williams
Douwe Kiela
Candace Ross
CoGe
17
400
0
07 Apr 2022
An Exploratory Study on Code Attention in BERT
An Exploratory Study on Code Attention in BERT
Rishab Sharma
Fuxiang Chen
Fatemeh H. Fard
David Lo
19
25
0
05 Apr 2022
Transformer Language Models without Positional Encodings Still Learn
  Positional Information
Transformer Language Models without Positional Encodings Still Learn Positional Information
Adi Haviv
Ori Ram
Ofir Press
Peter Izsak
Omer Levy
20
112
0
30 Mar 2022
Word Order Does Matter (And Shuffled Language Models Know It)
Word Order Does Matter (And Shuffled Language Models Know It)
Vinit Ravishankar
Mostafa Abdou
Artur Kulmizev
Anders Søgaard
17
44
0
21 Mar 2022
How does the pre-training objective affect what large language models
  learn about linguistic properties?
How does the pre-training objective affect what large language models learn about linguistic properties?
Ahmed Alajrami
Nikolaos Aletras
21
20
0
20 Mar 2022
Pretraining with Artificial Language: Studying Transferable Knowledge in
  Language Models
Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models
Ryokan Ri
Yoshimasa Tsuruoka
21
25
0
19 Mar 2022
Cross-Lingual Ability of Multilingual Masked Language Models: A Study of
  Language Structure
Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure
Yuan Chai
Yaobo Liang
Nan Duan
LRM
27
21
0
16 Mar 2022
When classifying grammatical role, BERT doesn't care about word order...
  except when it matters
When classifying grammatical role, BERT doesn't care about word order... except when it matters
Isabel Papadimitriou
Richard Futrell
Kyle Mahowald
MILM
22
29
0
11 Mar 2022
Oolong: Investigating What Makes Transfer Learning Hard with Controlled
  Studies
Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies
Zhengxuan Wu
Alex Tamkin
Isabel Papadimitriou
21
9
0
24 Feb 2022
Should You Mask 15% in Masked Language Modeling?
Should You Mask 15% in Masked Language Modeling?
Alexander Wettig
Tianyu Gao
Zexuan Zhong
Danqi Chen
CVBM
29
161
0
16 Feb 2022
Impact of Pretraining Term Frequencies on Few-Shot Reasoning
Impact of Pretraining Term Frequencies on Few-Shot Reasoning
Yasaman Razeghi
Robert L Logan IV
Matt Gardner
Sameer Singh
ReLM
LRM
17
150
0
15 Feb 2022
Grammatical cues to subjecthood are redundant in a majority of simple
  clauses across languages
Grammatical cues to subjecthood are redundant in a majority of simple clauses across languages
Kyle Mahowald
Evgeniia Diachek
E. Gibson
Evelina Fedorenko
Richard Futrell
14
10
0
30 Jan 2022
How Does Data Corruption Affect Natural Language Understanding Models? A
  Study on GLUE datasets
How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets
Aarne Talman
Marianna Apidianaki
S. Chatzikyriakidis
Jörg Tiedemann
ELM
19
0
0
12 Jan 2022
Toxicity Detection for Indic Multilingual Social Media Content
Toxicity Detection for Indic Multilingual Social Media Content
Manan A. Jhaveri
Devanshu Ramaiya
Harveen Singh Chadha
23
3
0
03 Jan 2022
A Fistful of Words: Learning Transferable Visual Models from
  Bag-of-Words Supervision
A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision
Ajinkya Tejankar
Maziar Sanjabi
Bichen Wu
Saining Xie
Madian Khabsa
Hamed Pirsiavash
Hamed Firooz
VLM
21
17
0
27 Dec 2021
The King is Naked: on the Notion of Robustness for Natural Language
  Processing
The King is Naked: on the Notion of Robustness for Natural Language Processing
Emanuele La Malfa
Marta Z. Kwiatkowska
20
28
0
13 Dec 2021
Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an
  Identity
Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity
Kurt Shuster
Jack Urbanek
Arthur Szlam
Jason Weston
HILM
13
24
0
10 Dec 2021
Variation and generality in encoding of syntactic anomaly information in
  sentence embeddings
Variation and generality in encoding of syntactic anomaly information in sentence embeddings
Qinxuan Wu
Allyson Ettinger
23
2
0
12 Nov 2021
Schrödinger's Tree -- On Syntax and Neural Language Models
Schrödinger's Tree -- On Syntax and Neural Language Models
Artur Kulmizev
Joakim Nivre
30
6
0
17 Oct 2021
Sparse Distillation: Speeding Up Text Classification by Using Bigger
  Student Models
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
29
5
0
16 Oct 2021
Causal Transformers Perform Below Chance on Recursive Nested
  Constructions, Unlike Humans
Causal Transformers Perform Below Chance on Recursive Nested Constructions, Unlike Humans
Yair Lakretz
T. Desbordes
Dieuwke Hupkes
S. Dehaene
230
11
0
14 Oct 2021
On a Benefit of Mask Language Modeling: Robustness to Simplicity Bias
On a Benefit of Mask Language Modeling: Robustness to Simplicity Bias
Ting-Rui Chiang
24
3
0
11 Oct 2021
How BPE Affects Memorization in Transformers
How BPE Affects Memorization in Transformers
Eugene Kharitonov
Marco Baroni
Dieuwke Hupkes
161
32
0
06 Oct 2021
Compositional generalization in semantic parsing with pretrained
  transformers
Compositional generalization in semantic parsing with pretrained transformers
A. Orhan
20
6
0
30 Sep 2021
Structural Persistence in Language Models: Priming as a Window into
  Abstract Language Representations
Structural Persistence in Language Models: Priming as a Window into Abstract Language Representations
Arabella J. Sinclair
Jaap Jumelet
Willem H. Zuidema
Raquel Fernández
56
38
0
30 Sep 2021
Analysing the Effect of Masking Length Distribution of MLM: An
  Evaluation Framework and Case Study on Chinese MRC Datasets
Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets
Changchang Zeng
Shaobo Li
16
6
0
29 Sep 2021
Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with
  Controllable Perturbations
Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations
Ekaterina Taktasheva
Vladislav Mikhailov
Ekaterina Artemova
8
13
0
28 Sep 2021
Pragmatic competence of pre-trained language models through the lens of
  discourse connectives
Pragmatic competence of pre-trained language models through the lens of discourse connectives
Lalchand Pandia
Yan Cong
Allyson Ettinger
9
25
0
27 Sep 2021
Training Dynamic based data filtering may not work for NLP datasets
Training Dynamic based data filtering may not work for NLP datasets
Arka Talukdar
Monika Dagar
Prachi Gupta
Varun G. Menon
NoLa
35
3
0
19 Sep 2021
Numerical reasoning in machine reading comprehension tasks: are we there
  yet?
Numerical reasoning in machine reading comprehension tasks: are we there yet?
Hadeel Al-Negheimish
Pranava Madhyastha
A. Russo
AIMat
ReLM
14
13
0
16 Sep 2021
The Impact of Positional Encodings on Multilingual Compression
The Impact of Positional Encodings on Multilingual Compression
Vinit Ravishankar
Anders Søgaard
17
5
0
11 Sep 2021
Does Pretraining for Summarization Require Knowledge Transfer?
Does Pretraining for Summarization Require Knowledge Transfer?
Kundan Krishna
Jeffrey P. Bigham
Zachary Chase Lipton
14
35
0
10 Sep 2021
Studying word order through iterative shuffling
Studying word order through iterative shuffling
Nikolay Malkin
Sameera Lanka
Pranav Goel
Nebojsa Jojic
21
14
0
10 Sep 2021
On the Transferability of Pre-trained Language Models: A Study from
  Artificial Datasets
On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets
Cheng-Han Chiang
Hung-yi Lee
SyDa
24
24
0
08 Sep 2021
BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker
  based on Residual Convolutional Neural Networks
BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks
T. Lai
Heng Ji
ChengXiang Zhai
19
31
0
06 Sep 2021
How Does Adversarial Fine-Tuning Benefit BERT?
How Does Adversarial Fine-Tuning Benefit BERT?
J. Ebrahimi
Hao Yang
Wei Zhang
AAML
18
4
0
31 Aug 2021
Local Structure Matters Most: Perturbation Study in NLU
Local Structure Matters Most: Perturbation Study in NLU
Louis Clouâtre
Prasanna Parthasarathi
Amal Zouaq
Sarath Chandar
22
13
0
29 Jul 2021
Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis
  of Head and Prompt Tuning
Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning
Colin Wei
Sang Michael Xie
Tengyu Ma
22
96
0
17 Jun 2021
Causal Analysis of Syntactic Agreement Mechanisms in Neural Language
  Models
Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models
Matthew Finlayson
Aaron Mueller
Sebastian Gehrmann
Stuart M. Shieber
Tal Linzen
Yonatan Belinkov
27
101
0
10 Jun 2021
Exploring Unsupervised Pretraining Objectives for Machine Translation
Exploring Unsupervised Pretraining Objectives for Machine Translation
Christos Baziotis
Ivan Titov
Alexandra Birch
Barry Haddow
AAML
AI4CE
18
7
0
10 Jun 2021
Investigating Transfer Learning in Multilingual Pre-trained Language
  Models through Chinese Natural Language Inference
Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference
Hai Hu
He Zhou
Zuoyu Tian
Yiwen Zhang
Yina Ma
Yanting Li
Yixin Nie
Kyle Richardson
19
11
0
07 Jun 2021
The Case for Translation-Invariant Self-Attention in Transformer-Based
  Language Models
The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
Ulme Wennberg
G. Henter
MILM
19
21
0
03 Jun 2021
Previous
1234
Next