Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.01825
Cited By
PMI-Masking: Principled masking of correlated spans
5 October 2020
Yoav Levine
Barak Lenz
Opher Lieber
Omri Abend
Kevin Leyton-Brown
Moshe Tennenholtz
Y. Shoham
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PMI-Masking: Principled masking of correlated spans"
19 / 19 papers shown
Title
Prediction-powered estimators for finite population statistics in highly imbalanced textual data: Public hate crime estimation
Hannes Waldetoft
Jakob Torgander
Måns Magnusson
29
0
0
05 May 2025
DEPTH: Discourse Education through Pre-Training Hierarchically
Zachary Bamberger
Ofek Glick
Chaim Baskin
Yonatan Belinkov
59
0
0
13 May 2024
How Useful is Continued Pre-Training for Generative Unsupervised Domain Adaptation?
Rheeya Uppaal
Yixuan Li
Junjie Hu
34
4
0
31 Jan 2024
Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model
Xiao Wang
Wei Zhou
Qi Zhang
Jie Zhou
Songyang Gao
Junzhe Wang
Menghan Zhang
Xiang Gao
Yunwen Chen
Tao Gui
38
7
0
22 May 2023
What is the best recipe for character-level encoder-only modelling?
Kris Cao
32
2
0
09 May 2023
Salient Span Masking for Temporal Understanding
Jeremy R. Cole
Aditi Chaudhary
Bhuwan Dhingra
Partha P. Talukdar
32
11
0
22 Mar 2023
Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling
Ahmed Elnaggar
Hazem Essam
Wafaa Salah-Eldin
Walid Moustafa
Mohamed Elkerdawy
Charlotte Rochereau
B. Rost
153
86
0
16 Jan 2023
Uniform Masking Prevails in Vision-Language Pretraining
Siddharth Verma
Yuchen Lu
Rui Hou
Hanchao Yu
Nicolas Ballas
Madian Khabsa
Amjad Almahairi
VLM
21
0
0
10 Dec 2022
Language Model Pre-training on True Negatives
Zhuosheng Zhang
Hai Zhao
Masao Utiyama
Eiichiro Sumita
22
2
0
01 Dec 2022
UniMASK: Unified Inference in Sequential Decision Problems
Micah Carroll
Orr Paradise
Jessy Lin
Raluca Georgescu
Mingfei Sun
...
Stephanie Milani
Katja Hofmann
Matthew J. Hausknecht
Anca Dragan
Sam Devlin
OffRL
24
21
0
20 Nov 2022
InforMask: Unsupervised Informative Masking for Language Model Pretraining
Nafis Sadeq
Canwen Xu
Julian McAuley
19
13
0
21 Oct 2022
Pre-training Language Models with Deterministic Factual Knowledge
Shaobo Li
Xiaoguang Li
Lifeng Shang
Chengjie Sun
Bingquan Liu
Zhenzhou Ji
Xin Jiang
Qun Liu
KELM
36
11
0
20 Oct 2022
Learning Better Masking for Better Language Model Pre-training
Dongjie Yang
Zhuosheng Zhang
Hai Zhao
24
15
0
23 Aug 2022
Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers
Luis Espinosa-Anke
A. Shvets
Alireza Mohammadshahi
James Henderson
Leo Wanner
23
4
0
23 May 2022
Unsupervised Slot Schema Induction for Task-oriented Dialog
Dian Yu
Mingqiu Wang
Yuan Cao
Izhak Shafran
Laurent El Shafey
H. Soltau
36
13
0
09 May 2022
Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning
Colin Wei
Sang Michael Xie
Tengyu Ma
22
96
0
17 Jun 2021
Which transformer architecture fits my data? A vocabulary bottleneck in self-attention
Noam Wies
Yoav Levine
Daniel Jannai
Amnon Shashua
40
20
0
09 May 2021
Studying Strategically: Learning to Mask for Closed-book QA
Qinyuan Ye
Belinda Z. Li
Sinong Wang
Benjamin Bolte
Hao Ma
Wen-tau Yih
Xiang Ren
Madian Khabsa
OffRL
19
11
0
31 Dec 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,950
0
20 Apr 2018
1