v1v2 (latest)

How to Train BERT with an Academic Budget

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

15 April 2021

Peter Izsak

Moshe Berchansky

Omer Levy

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "How to Train BERT with an Academic Budget"

21 / 71 papers shown

Word-Level Representation From Bytes For Language Modeling

Chul Lee

Qipeng Guo

Xipeng Qiu

215

23 Nov 2022

Training a Vision Transformer from scratch in less than 24 hours with 1 GPU

Hossein Hajimirsadeghi

ViT

168

09 Nov 2022

Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] TokenConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Baohao Liao

David Thulke

Sanjika Hewavitharana

Hermann Ney

Christof Monz

215

09 Nov 2022

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language ModelsInternational Conference on Machine Learning (ICML), 2022

326

25 Oct 2022

Effective Pre-Training Objectives for Transformer-based AutoencodersConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Luca Di Liello

Matteo Gabburo

Alessandro Moschitti

136

24 Oct 2022

Performance-Efficiency Trade-Offs in Adapting Language Models to Text Classification Tasks

Laura Aina

Nikos Voskarides

Roi Blanco

228

21 Oct 2022

Incorporating Context into Subword VocabulariesConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

Shaked Yehezkel

Yuval Pinter

217

13 Oct 2022

Spontaneous Emerging Preference in Two-tower Language Model

Zhengqi He

Taro Toyoizumi

LRM

203

13 Oct 2022

Pre-Training a Graph Recurrent Network for Language Representation

Yue Zhang

241

08 Sep 2022

Transformers with Learnable Activation FunctionsFindings (Findings), 2022

276

30 Aug 2022

What Dense Graph Do You Need for Self-Attention?International Conference on Machine Learning (ICML), 2022

Qipeng Guo

Xuanjing Huang

Xipeng Qiu

GNN

203

27 May 2022

Simple Recurrence Improves Masked Language Models

210

23 May 2022

On the SDEs and Scaling Rules for Adaptive Gradient AlgorithmsNeural Information Processing Systems (NeurIPS), 2022

390

20 May 2022

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

Xiaodong Liu

Xia Song

208

13 Apr 2022

DCT-Former: Efficient Self-Attention with Discrete Cosine TransformJournal of Scientific Computing (J. Sci. Comput.), 2022

400

02 Mar 2022

Should You Mask 15% in Masked Language Modeling?Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

Alexander Wettig

307

198

16 Feb 2022

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A SurveyACM Computing Surveys (CSUR), 2021

Bonan Min

Hayley L Ross

Elior Sulem

Amir Pouran Ben Veyseh

457

1,385

01 Nov 2021

Pre-train or Annotate? Domain Adaptation with a Constrained BudgetConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Fan Bai

Alan Ritter

Wei Xu

279

10 Sep 2021

Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of TokensNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Itay Itzhak

Omer Levy

218

25 Aug 2021

Curriculum learning for language modeling

Daniel Fernando Campos

178

04 Aug 2021

Greedy-layer Pruning: Speeding up Transformer Models for Natural Language ProcessingPattern Recognition Letters (PR), 2021

197

31 May 2021