Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.07705
Cited By
v1
v2 (latest)
How to Train BERT with an Academic Budget
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
15 April 2021
Peter Izsak
Moshe Berchansky
Omer Levy
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"How to Train BERT with an Academic Budget"
21 / 71 papers shown
Word-Level Representation From Bytes For Language Modeling
Chul Lee
Qipeng Guo
Xipeng Qiu
215
1
0
23 Nov 2022
Training a Vision Transformer from scratch in less than 24 hours with 1 GPU
Saghar Irandoust
Thibaut Durand
Yunduz Rakhmangulova
Wenjie Zi
Hossein Hajimirsadeghi
ViT
168
10
0
09 Nov 2022
Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Baohao Liao
David Thulke
Sanjika Hewavitharana
Hermann Ney
Christof Monz
215
9
0
09 Nov 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
International Conference on Machine Learning (ICML), 2022
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
326
69
0
25 Oct 2022
Effective Pre-Training Objectives for Transformer-based Autoencoders
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Luca Di Liello
Matteo Gabburo
Alessandro Moschitti
136
3
0
24 Oct 2022
Performance-Efficiency Trade-Offs in Adapting Language Models to Text Classification Tasks
Laura Aina
Nikos Voskarides
Roi Blanco
228
1
0
21 Oct 2022
Incorporating Context into Subword Vocabularies
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Shaked Yehezkel
Yuval Pinter
217
12
0
13 Oct 2022
Spontaneous Emerging Preference in Two-tower Language Model
Zhengqi He
Taro Toyoizumi
LRM
203
1
0
13 Oct 2022
Pre-Training a Graph Recurrent Network for Language Representation
Yile Wang
Linyi Yang
Zhiyang Teng
M. Zhou
Yue Zhang
GNN
241
1
0
08 Sep 2022
Transformers with Learnable Activation Functions
Findings (Findings), 2022
Haishuo Fang
Ji-Ung Lee
N. Moosavi
Iryna Gurevych
AI4CE
276
12
0
30 Aug 2022
What Dense Graph Do You Need for Self-Attention?
International Conference on Machine Learning (ICML), 2022
Yuxing Wang
Chu-Tak Lee
Qipeng Guo
Zhangyue Yin
Yunhua Zhou
Xuanjing Huang
Xipeng Qiu
GNN
203
5
0
27 May 2022
Simple Recurrence Improves Masked Language Models
Tao Lei
Ran Tian
Jasmijn Bastings
Ankur P. Parikh
210
4
0
23 May 2022
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Neural Information Processing Systems (NeurIPS), 2022
Sadhika Malladi
Kaifeng Lyu
A. Panigrahi
Sanjeev Arora
390
82
0
20 May 2022
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals
Payal Bajaj
Chenyan Xiong
Guolin Ke
Xiaodong Liu
Di He
Saurabh Tiwary
Tie-Yan Liu
Paul N. Bennett
Xia Song
Jianfeng Gao
208
34
0
13 Apr 2022
DCT-Former: Efficient Self-Attention with Discrete Cosine Transform
Journal of Scientific Computing (J. Sci. Comput.), 2022
Carmelo Scribano
Giorgia Franchini
M. Prato
Marko Bertogna
400
33
0
02 Mar 2022
Should You Mask 15% in Masked Language Modeling?
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Alexander Wettig
Tianyu Gao
Zexuan Zhong
Danqi Chen
CVBM
307
198
0
16 Feb 2022
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey
ACM Computing Surveys (CSUR), 2021
Bonan Min
Hayley L Ross
Elior Sulem
Amir Pouran Ben Veyseh
Thien Huu Nguyen
Oscar Sainz
Eneko Agirre
Ilana Heinz
Dan Roth
LM&MA
VLM
AI4CE
457
1,385
0
01 Nov 2021
Pre-train or Annotate? Domain Adaptation with a Constrained Budget
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Fan Bai
Alan Ritter
Wei Xu
279
34
0
10 Sep 2021
Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Itay Itzhak
Omer Levy
218
25
0
25 Aug 2021
Curriculum learning for language modeling
Daniel Fernando Campos
178
39
0
04 Aug 2021
Greedy-layer Pruning: Speeding up Transformer Models for Natural Language Processing
Pattern Recognition Letters (PR), 2021
David Peer
Sebastian Stabinger
Stefan Engl
A. Rodríguez-Sánchez
197
31
0
31 May 2021
Previous
1
2
Page 2 of 2