ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.00063
  4. Cited By
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
v1v2 (latest)

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

31 December 2020
Xiaohan Chen
Yu Cheng
Shuohang Wang
Zhe Gan
Zhangyang Wang
Jingjing Liu
ArXiv (abs)PDFHTMLGithub (18★)

Papers citing "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets"

50 / 64 papers shown
Title
PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning
PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning
Yisu Wang
Ruilong Wu
Xinjiao Li
Dirk Kutscher
235
0
0
24 May 2025
Early-Bird Diffusion: Investigating and Leveraging Timestep-Aware Early-Bird Tickets in Diffusion Models for Efficient Training
Early-Bird Diffusion: Investigating and Leveraging Timestep-Aware Early-Bird Tickets in Diffusion Models for Efficient Training
Lexington Whalen
Zhenbang Du
Haoran You
Chaojian Li
Sixu Li
Yingyan
95
0
0
13 Apr 2025
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
Yuezhou Hu
Jun-Jie Zhu
Jianfei Chen
129
0
0
13 Sep 2024
Pre-Training Identification of Graph Winning Tickets in Adaptive
  Spatial-Temporal Graph Neural Networks
Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks
Wenying Duan
Tianxiang Fang
Hong Rao
Xiaoxi He
80
0
0
12 Jun 2024
Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
Naibin Gu
Peng Fu
Xiyu Liu
Bowen Shen
Zheng Lin
Weiping Wang
69
10
0
06 Jun 2024
The EarlyBird Gets the WORM: Heuristically Accelerating EarlyBird
  Convergence
The EarlyBird Gets the WORM: Heuristically Accelerating EarlyBird Convergence
Adithya Vasudev
83
0
0
31 May 2024
Early Transformers: A study on Efficient Training of Transformer Models
  through Early-Bird Lottery Tickets
Early Transformers: A study on Efficient Training of Transformer Models through Early-Bird Lottery Tickets
Shravan Cheekati
29
0
0
02 May 2024
Emerging Property of Masked Token for Effective Pre-training
Emerging Property of Masked Token for Effective Pre-training
Hyesong Choi
Hunsang Lee
Seyoung Joung
Hyejin Park
Jiyeong Kim
Dongbo Min
87
10
0
12 Apr 2024
Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced
  Pre-training
Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training
Hyesong Choi
Hyejin Park
Kwang Moo Yi
Sungmin Cha
Dongbo Min
105
9
0
12 Apr 2024
Accelerating Transformer Pre-training with 2:4 Sparsity
Accelerating Transformer Pre-training with 2:4 Sparsity
Yuezhou Hu
Kang Zhao
Weiyu Huang
Jianfei Chen
Jun Zhu
131
9
0
02 Apr 2024
CHAI: Clustered Head Attention for Efficient LLM Inference
CHAI: Clustered Head Attention for Efficient LLM Inference
Saurabh Agarwal
Bilge Acun
Basil Homer
Mostafa Elhoushi
Yejin Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
Carole-Jean Wu
107
11
0
12 Mar 2024
A Survey of Lottery Ticket Hypothesis
A Survey of Lottery Ticket Hypothesis
Bohan Liu
Zijie Zhang
Peixiong He
Zhensen Wang
Yang Xiao
Ruimeng Ye
Yang Zhou
Wei-Shinn Ku
Bo Hui
UQCV
91
15
0
07 Mar 2024
OSSCAR: One-Shot Structured Pruning in Vision and Language Models with
  Combinatorial Optimization
OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization
Xiang Meng
Shibal Ibrahim
Kayhan Behdin
Hussein Hazimeh
Natalia Ponomareva
Rahul Mazumder
VLM
104
8
0
02 Mar 2024
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for
  Large Language Models
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models
Amit Dhurandhar
Tejaswini Pedapati
Ronny Luss
Soham Dan
Aurélie C. Lozano
Payel Das
Georgios Kollias
84
3
0
28 Feb 2024
Dynamic Layer Tying for Parameter-Efficient Transformers
Dynamic Layer Tying for Parameter-Efficient Transformers
Tamir David Hay
Lior Wolf
72
3
0
23 Jan 2024
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale
  of Two Benchmarks
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks
Ting-Yun Chang
Jesse Thomason
Robin Jia
81
19
0
15 Nov 2023
Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy
  for Language Models
Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models
Jianwei Li
Qi Lei
Wei Cheng
Dongkuan Xu
KELM
71
6
0
19 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zhangyang Wang
Yinfei Yang
MQ
128
50
0
02 Oct 2023
The Snowflake Hypothesis: Training Deep GNN with One Node One Receptive
  field
The Snowflake Hypothesis: Training Deep GNN with One Node One Receptive field
Kun Wang
Guohao Li
Shilong Wang
Guibin Zhang
Kaidi Wang
Yang You
Xiaojiang Peng
Yuxuan Liang
Yang Wang
68
9
0
19 Aug 2023
Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication
  Kernels
Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels
Vikas Natesh
Andrew Sabot
H. T. Kung
Mark Ting
58
0
0
08 Jul 2023
Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery
  Tickets from Large Models
Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models
A. Jaiswal
Shiwei Liu
Tianlong Chen
Ying Ding
Zhangyang Wang
VLM
115
21
0
18 Jun 2023
The Emergence of Essential Sparsity in Large Pre-trained Models: The
  Weights that Matter
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Ajay Jaiswal
Shiwei Liu
Tianlong Chen
Zhangyang Wang
VLM
71
34
0
06 Jun 2023
PruMUX: Augmenting Data Multiplexing with Model Compression
PruMUX: Augmenting Data Multiplexing with Model Compression
Yushan Su
Vishvak Murahari
Karthik Narasimhan
Keqin Li
62
3
0
24 May 2023
Masked Structural Growth for 2x Faster Language Model Pre-training
Masked Structural Growth for 2x Faster Language Model Pre-training
Yiqun Yao
Zheng Zhang
Jing Li
Yequan Wang
OffRLAI4CELRM
95
16
0
04 May 2023
Gradient-Free Structured Pruning with Unlabeled Data
Gradient-Free Structured Pruning with Unlabeled Data
Azade Nova
H. Dai
Dale Schuurmans
SyDa
84
22
0
07 Mar 2023
Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook
  for Sparse Neural Network Researchers
Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers
Shiwei Liu
Zhangyang Wang
125
32
0
06 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
128
49
0
02 Feb 2023
Curriculum-Guided Abstractive Summarization
Curriculum-Guided Abstractive Summarization
Sajad Sotudeh
Hanieh Deilamsalehy
Franck Dernoncourt
Nazli Goharian
84
2
0
02 Feb 2023
Curriculum-guided Abstractive Summarization for Mental Health Online
  Posts
Curriculum-guided Abstractive Summarization for Mental Health Online Posts
Sajad Sotudeh
Nazli Goharian
Hanieh Deilamsalehy
Franck Dernoncourt
AI4MH
10
5
0
02 Feb 2023
On the Effectiveness of Parameter-Efficient Fine-Tuning
On the Effectiveness of Parameter-Efficient Fine-Tuning
Z. Fu
Haoran Yang
Anthony Man-Cho So
Wai Lam
Lidong Bing
Nigel Collier
76
162
0
28 Nov 2022
Efficient Adversarial Training with Robust Early-Bird Tickets
Efficient Adversarial Training with Robust Early-Bird Tickets
Zhiheng Xi
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
AAML
84
9
0
14 Nov 2022
Robust Lottery Tickets for Pre-trained Language Models
Robust Lottery Tickets for Pre-trained Language Models
Rui Zheng
Rong Bao
Yuhao Zhou
Di Liang
Sirui Wang
Wei Wu
Tao Gui
Qi Zhang
Xuanjing Huang
AAML
87
14
0
06 Nov 2022
Intriguing Properties of Compression on Multilingual Models
Intriguing Properties of Compression on Multilingual Models
Kelechi Ogueji
Orevaoghene Ahia
Gbemileke Onilude
Sebastian Gehrmann
Sara Hooker
Julia Kreutzer
80
14
0
04 Nov 2022
BEBERT: Efficient and Robust Binary Ensemble BERT
BEBERT: Efficient and Robust Binary Ensemble BERT
Jiayi Tian
Chao Fang
Hong Wang
Zhongfeng Wang
MQ
92
17
0
28 Oct 2022
Gradient-based Weight Density Balancing for Robust Dynamic Sparse
  Training
Gradient-based Weight Density Balancing for Robust Dynamic Sparse Training
Mathias Parger
Alexander Ertl
Paul Eibensteiner
J. H. Mueller
Martin Winter
M. Steinberger
52
0
0
25 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
107
51
0
13 Oct 2022
The Lottery Ticket Hypothesis for Self-attention in Convolutional Neural
  Network
The Lottery Ticket Hypothesis for Self-attention in Convolutional Neural Network
Zhongzhan Huang
Senwei Liang
Mingfu Liang
Wei He
Haizhao Yang
Liang Lin
72
9
0
16 Jul 2022
Data-Efficient Double-Win Lottery Tickets from Robust Pre-training
Data-Efficient Double-Win Lottery Tickets from Robust Pre-training
Tianlong Chen
Zhenyu Zhang
Sijia Liu
Yang Zhang
Shiyu Chang
Zhangyang Wang
AAML
74
8
0
09 Jun 2022
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More
  Compressible Models
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
Clara Na
Sanket Vaibhav Mehta
Emma Strubell
114
20
0
25 May 2022
Task-specific Compression for Multi-task Language Models using
  Attribution-based Pruning
Task-specific Compression for Multi-task Language Models using Attribution-based Pruning
Nakyeong Yang
Yunah Jang
Hwanhee Lee
Seohyeong Jung
Kyomin Jung
28
8
0
09 May 2022
Monarch: Expressive Structured Matrices for Efficient and Accurate
  Training
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Tri Dao
Beidi Chen
N. Sohoni
Arjun D Desai
Michael Poli
Jessica Grogan
Alexander Liu
Aniruddh Rao
Atri Rudra
Christopher Ré
141
97
0
01 Apr 2022
Structured Pruning Learns Compact and Accurate Models
Structured Pruning Learns Compact and Accurate Models
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
113
189
0
01 Apr 2022
A Fast Post-Training Pruning Framework for Transformers
A Fast Post-Training Pruning Framework for Transformers
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
113
156
0
29 Mar 2022
The Unreasonable Effectiveness of Random Pruning: Return of the Most
  Naive Baseline for Sparse Training
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training
Shiwei Liu
Tianlong Chen
Xiaohan Chen
Li Shen
Decebal Constantin Mocanu
Zhangyang Wang
Mykola Pechenizkiy
98
113
0
05 Feb 2022
On the Compression of Natural Language Models
On the Compression of Natural Language Models
S. Damadi
34
0
0
13 Dec 2021
i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery
i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery
Cameron R. Wolfe
Anastasios Kyrillidis
57
1
0
07 Dec 2021
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient
  Framework
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
Xingcheng Yao
Yanan Zheng
Xiaocong Yang
Zhilin Yang
86
45
0
07 Nov 2021
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
  Models
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models
Xuxi Chen
Tianlong Chen
Weizhu Chen
Ahmed Hassan Awadallah
Zhangyang Wang
Yu Cheng
MoEALM
57
10
0
30 Oct 2021
Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks
Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks
Yonggan Fu
Qixuan Yu
Yang Zhang
Shan-Hung Wu
Ouyang Xu
David D. Cox
Yingyan Lin
AAMLOOD
127
30
0
26 Oct 2021
When to Prune? A Policy towards Early Structural Pruning
When to Prune? A Policy towards Early Structural Pruning
Maying Shen
Pavlo Molchanov
Hongxu Yin
J. Álvarez
VLM
77
56
0
22 Oct 2021
12
Next