Progressively Stacking 2.0: A Multi-stage Layerwise Training Method for BERT Training Speedup

27 November 2020

Papers citing "Progressively Stacking 2.0: A Multi-stage Layerwise Training Method for BERT Training Speedup"

24 / 24 papers shown

Efficient Training for Human Video Generation with Entropy-Guided Prioritized Progressive Learning

314

26 Nov 2025

Deep Progressive Training: scaling up depth capacity of zero/one-layer models

Zhiqi Bu

AI4CE

162

07 Nov 2025

Progressive Depth Up-scaling via Optimal Transport

Mingzi Cao

Xi Wang

Nikolaos Aletras

104

11 Aug 2025

Curriculum-Guided Layer Scaling for Language Model Pretraining

285

13 Jun 2025

LESA: Learnable LLM Layer Scaling-UpAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

450

20 Feb 2025

Gap Preserving Distillation by Building Bidirectional Mappings with A Dynamic TeacherInternational Conference on Learning Representations (ICLR), 2024

Yong Guo

Yulun Zhang

351

05 Oct 2024

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization

Mohammad Samragh

Iman Mirzadeh

Keivan Alizadeh Vahid

Fartash Faghri

Mehrdad Farajtabar

364

19 Sep 2024

Efficient Training of Large Vision Models via Advanced Automated Progressive Learning

Changlin Li

302

06 Sep 2024

Federating to Grow Transformers with Constrained Resources without Model Sharing

302

19 Jun 2024

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

262

24 May 2024

A Multi-Level Framework for Accelerating Training Transformer Models

350

07 Apr 2024

Preparing Lessons for Progressive Training on Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024

Lifeng Shang

Xin Jiang

Qun Liu

306

17 Jan 2024

Self-Influence Guided Data Reweighting for Language Model Pre-trainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Sriram Ganapathy

303

02 Nov 2023

Reusing Pretrained Models by Multi-linear Operators for Efficient Training

Lifeng Shang

Xin Jiang

Qun Liu

314

16 Oct 2023

LEMON: Lossless model expansionInternational Conference on Learning Representations (ICLR), 2023

Jianbo Yuan

Hongxia Yang

252

12 Oct 2023

Masked Structural Growth for 2x Faster Language Model Pre-trainingInternational Conference on Learning Representations (ICLR), 2023

Yequan Wang

356

04 May 2023

Learning to Grow Pretrained Models for Efficient Transformer TrainingInternational Conference on Learning Representations (ICLR), 2023

Peihao Wang

Yikang Shen

Lucas Torroba Hennigen

295

02 Mar 2023

Towards energy-efficient Deep Learning: An overview of energy-efficient approaches along the Deep Learning Lifecycle

280

05 Feb 2023

Sparse Upcycling: Training Mixture-of-Experts from Dense CheckpointsInternational Conference on Learning Representations (ICLR), 2022

Joshua Ainslie

268

180

09 Dec 2022

Automated Progressive Learning for Efficient Training of Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2022

Changlin Li

Bohan Zhuang

Guangrun Wang

Xiaodan Liang

Xiaojun Chang

Yi Yang

292

28 Mar 2022

A Survey on Green Deep Learning

Lei Li

494

102

08 Nov 2021

bert2BERT: Towards Reusable Pretrained Language Models

Cheng Chen

Yichun Yin

Lifeng Shang

Xin Jiang

Zhiyuan Liu

Qun Liu

VLM

290

14 Oct 2021

Training ELECTRA Augmented with Multi-word SelectionFindings (Findings), 2021

267

31 May 2021

How to Train BERT with an Academic BudgetConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Peter Izsak

Moshe Berchansky

Omer Levy

448

131

15 Apr 2021