Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2011.13635
Cited By
Progressively Stacking 2.0: A Multi-stage Layerwise Training Method for BERT Training Speedup
27 November 2020
Cheng Yang
Shengnan Wang
Chao Yang
Yuechuan Li
Ru He
Jingqiao Zhang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Progressively Stacking 2.0: A Multi-stage Layerwise Training Method for BERT Training Speedup"
24 / 24 papers shown
Efficient Training for Human Video Generation with Entropy-Guided Prioritized Progressive Learning
Changlin Li
Jiawei Zhang
Shuhao Liu
Sihao Lin
Z. Shi
Zhihui Li
Xiaojun Chang
DiffM
VGen
314
1
0
26 Nov 2025
Deep Progressive Training: scaling up depth capacity of zero/one-layer models
Zhiqi Bu
AI4CE
162
0
0
07 Nov 2025
Progressive Depth Up-scaling via Optimal Transport
Mingzi Cao
Xi Wang
Nikolaos Aletras
104
1
0
11 Aug 2025
Curriculum-Guided Layer Scaling for Language Model Pretraining
Karanpartap Singh
Neil Band
Ehsan Adeli
ALM
LRM
285
1
0
13 Jun 2025
LESA: Learnable LLM Layer Scaling-Up
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yifei Yang
Zouying Cao
Xinbei Ma
Yao Yao
L. Qin
Zhongfu Chen
Hai Zhao
450
5
0
20 Feb 2025
Gap Preserving Distillation by Building Bidirectional Mappings with A Dynamic Teacher
International Conference on Learning Representations (ICLR), 2024
Yong Guo
Shulian Zhang
Haolin Pan
Jing Liu
Yulun Zhang
Jian Chen
351
1
0
05 Oct 2024
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
Mohammad Samragh
Iman Mirzadeh
Keivan Alizadeh Vahid
Fartash Faghri
Minsik Cho
Moin Nabi
Devang Naik
Mehrdad Farajtabar
LRM
AI4CE
364
21
0
19 Sep 2024
Efficient Training of Large Vision Models via Advanced Automated Progressive Learning
Changlin Li
Jiawei Zhang
Sihao Lin
Zongxin Yang
Junwei Liang
Xiaodan Liang
Xiaojun Chang
VLM
302
2
0
06 Sep 2024
Federating to Grow Transformers with Constrained Resources without Model Sharing
Shikun Shen
Yifei Zou
Yuan Yuan
Yanwei Zheng
Peng Li
Xiuzhen Cheng
Dongxiao Yu
302
1
0
19 Jun 2024
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Wenyu Du
Tongxu Luo
Zihan Qiu
Zeyu Huang
Songlin Yang
Reynold Cheng
Wenhan Luo
Jie Fu
262
38
0
24 May 2024
A Multi-Level Framework for Accelerating Training Transformer Models
Longwei Zou
Han Zhang
Yangdong Deng
AI4CE
350
3
0
07 Apr 2024
Preparing Lessons for Progressive Training on Language Models
AAAI Conference on Artificial Intelligence (AAAI), 2024
Yu Pan
Ye Yuan
Yichun Yin
Jiaxin Shi
Zenglin Xu
Ming Zhang
Lifeng Shang
Xin Jiang
Qun Liu
306
14
0
17 Jan 2024
Self-Influence Guided Data Reweighting for Language Model Pre-training
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Megh Thakkar
Tolga Bolukbasi
Sriram Ganapathy
Shikhar Vashishth
Sarath Chandar
Partha P. Talukdar
MILM
303
36
0
02 Nov 2023
Reusing Pretrained Models by Multi-linear Operators for Efficient Training
Yu Pan
Ye Yuan
Yichun Yin
Zenglin Xu
Lifeng Shang
Xin Jiang
Qun Liu
314
21
0
16 Oct 2023
LEMON: Lossless model expansion
International Conference on Learning Representations (ICLR), 2023
Yite Wang
Jiahao Su
Hanlin Lu
Cong Xie
Tianyi Liu
Jianbo Yuan
Yanghua Peng
Tian Ding
Hongxia Yang
252
24
0
12 Oct 2023
Masked Structural Growth for 2x Faster Language Model Pre-training
International Conference on Learning Representations (ICLR), 2023
Yiqun Yao
Zheng Zhang
Jing Li
Yequan Wang
OffRL
AI4CE
LRM
356
29
0
04 May 2023
Learning to Grow Pretrained Models for Efficient Transformer Training
International Conference on Learning Representations (ICLR), 2023
Peihao Wang
Yikang Shen
Lucas Torroba Hennigen
P. Greengard
Leonid Karlinsky
Rogerio Feris
David D. Cox
Zinan Lin
Yoon Kim
295
79
0
02 Mar 2023
Towards energy-efficient Deep Learning: An overview of energy-efficient approaches along the Deep Learning Lifecycle
Vanessa Mehlin
Sigurd Schacht
Carsten Lanquillon
HAI
MedIm
280
28
0
05 Feb 2023
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
International Conference on Learning Representations (ICLR), 2022
Aran Komatsuzaki
J. Puigcerver
James Lee-Thorp
Carlos Riquelme Ruiz
Basil Mustafa
Joshua Ainslie
Yi Tay
Mostafa Dehghani
N. Houlsby
MoMe
MoE
268
180
0
09 Dec 2022
Automated Progressive Learning for Efficient Training of Vision Transformers
Computer Vision and Pattern Recognition (CVPR), 2022
Changlin Li
Bohan Zhuang
Guangrun Wang
Xiaodan Liang
Xiaojun Chang
Yi Yang
292
55
0
28 Mar 2022
A Survey on Green Deep Learning
Jingjing Xu
Wangchunshu Zhou
Zhiyi Fu
Hao Zhou
Lei Li
VLM
494
102
0
08 Nov 2021
bert2BERT: Towards Reusable Pretrained Language Models
Cheng Chen
Yichun Yin
Lifeng Shang
Xin Jiang
Yujia Qin
Fengyu Wang
Zhi Wang
Xiao Chen
Zhiyuan Liu
Qun Liu
VLM
290
78
0
14 Oct 2021
Training ELECTRA Augmented with Multi-word Selection
Findings (Findings), 2021
Jiaming Shen
Jialu Liu
Tianqi Liu
Cong Yu
Jiawei Han
267
11
0
31 May 2021
How to Train BERT with an Academic Budget
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Peter Izsak
Moshe Berchansky
Omer Levy
448
131
0
15 Apr 2021
1
Page 1 of 1