ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.07143
  4. Cited By
bert2BERT: Towards Reusable Pretrained Language Models

bert2BERT: Towards Reusable Pretrained Language Models

14 October 2021
Cheng Chen
Yichun Yin
Lifeng Shang
Xin Jiang
Yujia Qin
Fengyu Wang
Zhi Wang
Xiao Chen
Zhiyuan Liu
Qun Liu
    VLM
ArXivPDFHTML

Papers citing "bert2BERT: Towards Reusable Pretrained Language Models"

45 / 45 papers shown
Title
A multilevel approach to accelerate the training of Transformers
A multilevel approach to accelerate the training of Transformers
Guillaume Lauga
Maël Chaumette
Edgar Desainte-Maréville
Étienne Lasalle
Arthur Lebeurrier
AI4CE
29
0
0
24 Apr 2025
Efficient Construction of Model Family through Progressive Training Using Model Expansion
Efficient Construction of Model Family through Progressive Training Using Model Expansion
Kazuki Yano
Sho Takase
Sosuke Kobayashi
Shun Kiyono
Jun Suzuki
48
0
0
01 Apr 2025
Efficient Model Development through Fine-tuning Transfer
Efficient Model Development through Fine-tuning Transfer
Pin-Jie Lin
Rishab Balasubramanian
Fengyuan Liu
Nikhil Kandpal
Tu Vu
59
0
0
25 Mar 2025
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Haiyang Wang
Yue Fan
Muhammad Ferjad Naeem
Yongqin Xian
J. E. Lenssen
Liwei Wang
F. Tombari
Bernt Schiele
41
2
0
30 Oct 2024
Gap Preserving Distillation by Building Bidirectional Mappings with A
  Dynamic Teacher
Gap Preserving Distillation by Building Bidirectional Mappings with A Dynamic Teacher
Yong Guo
Shulian Zhang
Haolin Pan
Jing Liu
Yulun Zhang
Jian Chen
30
0
0
05 Oct 2024
On the Inductive Bias of Stacking Towards Improving Reasoning
On the Inductive Bias of Stacking Towards Improving Reasoning
Nikunj Saunshi
Stefani Karp
Shankar Krishnan
Sobhan Miryoosefi
Sashank J. Reddi
Sanjiv Kumar
LRM
AI4CE
29
4
0
27 Sep 2024
Scaling Smart: Accelerating Large Language Model Pre-training with Small
  Model Initialization
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
Mohammad Samragh
Iman Mirzadeh
Keivan Alizadeh Vahid
Fartash Faghri
Minsik Cho
Moin Nabi
Devang Naik
Mehrdad Farajtabar
LRM
AI4CE
27
6
0
19 Sep 2024
Towards General Industrial Intelligence: A Survey on IIoT-Enhanced
  Continual Large Models
Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models
Jiao Chen
Jiayi He
Fangfang Chen
Zuohong Lv
Jianhua Tang
Weihua Li
Zuozhu Liu
Howard H. Yang
Guangjie Han
AI4CE
34
1
0
02 Sep 2024
A Mean Field Ansatz for Zero-Shot Weight Transfer
A Mean Field Ansatz for Zero-Shot Weight Transfer
Xingyuan Chen
Wenwei Kuang
Lei Deng
Wei Han
Bo Bai
Goncalo dos Reis
34
1
0
16 Aug 2024
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Chenze Shao
Fandong Meng
Jie Zhou
41
1
0
17 Jul 2024
52B to 1T: Lessons Learned via Tele-FLM Series
52B to 1T: Lessons Learned via Tele-FLM Series
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Chao Wang
...
Yequan Wang
Zhongjiang He
Zhongyuan Wang
Xuelong Li
Tiejun Huang
ALM
LRM
39
2
0
03 Jul 2024
Federating to Grow Transformers with Constrained Resources without Model
  Sharing
Federating to Grow Transformers with Constrained Resources without Model Sharing
Shikun Shen
Yifei Zou
Yuan Yuan
Yanwei Zheng
Peng Li
Xiuzhen Cheng
Dongxiao Yu
41
0
0
19 Jun 2024
Towards Lifelong Learning of Large Language Models: A Survey
Towards Lifelong Learning of Large Language Models: A Survey
Junhao Zheng
Shengjie Qiu
Chengming Shi
Qianli Ma
KELM
CLL
28
14
0
10 Jun 2024
Landscape-Aware Growing: The Power of a Little LAG
Landscape-Aware Growing: The Power of a Little LAG
Stefani Karp
Nikunj Saunshi
Sobhan Miryoosefi
Sashank J. Reddi
Sanjiv Kumar
43
1
0
04 Jun 2024
Stacking Your Transformers: A Closer Look at Model Growth for Efficient
  LLM Pre-Training
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Wenyu Du
Tongxu Luo
Zihan Qiu
Zeyu Huang
Yikang Shen
Reynold Cheng
Yike Guo
Jie Fu
34
10
0
24 May 2024
Text-to-Model: Text-Conditioned Neural Network Diffusion for Train-Once-for-All Personalization
Text-to-Model: Text-Conditioned Neural Network Diffusion for Train-Once-for-All Personalization
Zexi Li
Lingzhi Gao
Chao Wu
AI4CE
DiffM
55
3
0
23 May 2024
A Multi-Level Framework for Accelerating Training Transformer Models
A Multi-Level Framework for Accelerating Training Transformer Models
Longwei Zou
Han Zhang
Yangdong Deng
AI4CE
32
1
0
07 Apr 2024
A General and Efficient Training for Transformer via Token Expansion
A General and Efficient Training for Transformer via Token Expansion
Wenxuan Huang
Yunhang Shen
Jiao Xie
Baochang Zhang
Gaoqi He
Ke Li
Xing Sun
Shaohui Lin
38
3
0
31 Mar 2024
Beyond Uniform Scaling: Exploring Depth Heterogeneity in Neural
  Architectures
Beyond Uniform Scaling: Exploring Depth Heterogeneity in Neural Architectures
Akash Guna R.T
Arnav Chavan
Deepak Gupta
MDE
21
0
0
19 Feb 2024
Efficient Stagewise Pretraining via Progressive Subnetworks
Efficient Stagewise Pretraining via Progressive Subnetworks
Abhishek Panigrahi
Nikunj Saunshi
Kaifeng Lyu
Sobhan Miryoosefi
Sashank J. Reddi
Satyen Kale
Sanjiv Kumar
28
5
0
08 Feb 2024
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Chen Tang
Yuan Meng
Jiacheng Jiang
Shuzhao Xie
Rongwei Lu
Xinzhu Ma
Zhi Wang
Wenwu Zhu
MQ
22
7
0
03 Jan 2024
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with
  Time-Decoupled Training and Reusable Coop-Diffusion
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
Guansong Lu
Yuanfan Guo
Jianhua Han
Minzhe Niu
Yihan Zeng
Songcen Xu
Zeyi Huang
Zhao Zhong
Wei Zhang
Hang Xu
31
4
0
27 Dec 2023
Initializing Models with Larger Ones
Initializing Models with Larger Ones
Zhiqiu Xu
Yanjie Chen
Kirill Vishniakov
Yida Yin
Zhiqiang Shen
Trevor Darrell
Lingjie Liu
Zhuang Liu
28
17
0
30 Nov 2023
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from
  a Parametric Perspective
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
Ming Zhong
Chenxin An
Weizhu Chen
Jiawei Han
Pengcheng He
21
8
0
17 Oct 2023
Reusing Pretrained Models by Multi-linear Operators for Efficient
  Training
Reusing Pretrained Models by Multi-linear Operators for Efficient Training
Yu Pan
Ye Yuan
Yichun Yin
Zenglin Xu
Lifeng Shang
Xin Jiang
Qun Liu
37
16
0
16 Oct 2023
LEMON: Lossless model expansion
LEMON: Lossless model expansion
Yite Wang
Jiahao Su
Hanlin Lu
Cong Xie
Tianyi Liu
Jianbo Yuan
Haibin Lin
Ruoyu Sun
Hongxia Yang
12
12
0
12 Oct 2023
Automatic Personalized Impression Generation for PET Reports Using Large
  Language Models
Automatic Personalized Impression Generation for PET Reports Using Large Language Models
Xin Tie
Muheon Shin
Ali Pirasteh
Nevein Ibrahim
Zachary Huemann
...
K. M. Kelly
John W. Garrett
Junjie Hu
Steve Y. Cho
Tyler J. Bradshaw
LM&MA
17
10
0
18 Sep 2023
FLM-101B: An Open LLM and How to Train It with $100K Budget
FLM-101B: An Open LLM and How to Train It with 100KBudget100K Budget100KBudget
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
LI DU
Bowen Qin
Zheng-Wei Zhang
Aixin Sun
Yequan Wang
55
21
0
07 Sep 2023
Composable Function-preserving Expansions for Transformer Architectures
Composable Function-preserving Expansions for Transformer Architectures
Andrea Gesmundo
Kaitlin Maile
AI4CE
32
8
0
11 Aug 2023
BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained
  Transformer
BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained Transformer
Z. Li
Shitou Zhang
Hai Zhao
Yifei Yang
Dongjie Yang
LM&MA
11
14
0
01 Jul 2023
Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq
  Models
Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models
Saleh Soltan
Andrew Rosenbaum
Tobias Falke
Qin Lu
Anna Rumshisky
Wael Hamza
11
0
0
14 Jun 2023
Recyclable Tuning for Continual Pre-training
Recyclable Tuning for Continual Pre-training
Yujia Qin
Cheng Qian
Xu Han
Yankai Lin
Huadong Wang
Ruobing Xie
Zhiyuan Liu
Maosong Sun
Jie Zhou
CLL
13
11
0
15 May 2023
Masked Structural Growth for 2x Faster Language Model Pre-training
Masked Structural Growth for 2x Faster Language Model Pre-training
Yiqun Yao
Zheng-Wei Zhang
Jing Li
Yequan Wang
OffRL
AI4CE
LRM
40
15
0
04 May 2023
Learning to Grow Pretrained Models for Efficient Transformer Training
Learning to Grow Pretrained Models for Efficient Transformer Training
Peihao Wang
Rameswar Panda
Lucas Torroba Hennigen
P. Greengard
Leonid Karlinsky
Rogerio Feris
David D. Cox
Zhangyang Wang
Yoon Kim
23
53
0
02 Mar 2023
GreenPLM: Cross-Lingual Transfer of Monolingual Pre-Trained Language
  Models at Almost No Cost
GreenPLM: Cross-Lingual Transfer of Monolingual Pre-Trained Language Models at Almost No Cost
Qingcheng Zeng
Lucas Garay
Peilin Zhou
Dading Chong
Yining Hua
Jiageng Wu
Yi-Cheng Pan
Han Zhou
Rob Voigt
Jie Yang
VLM
19
22
0
13 Nov 2022
FPT: Improving Prompt Tuning Efficiency via Progressive Training
FPT: Improving Prompt Tuning Efficiency via Progressive Training
Yufei Huang
Yujia Qin
Huadong Wang
Yichun Yin
Maosong Sun
Zhiyuan Liu
Qun Liu
VLM
LRM
22
6
0
13 Nov 2022
Evaluating the Susceptibility of Pre-Trained Language Models via
  Handcrafted Adversarial Examples
Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples
Hezekiah J. Branch
Jonathan Rodriguez Cefalu
Jeremy McHugh
Leyla Hujer
Aditya Bahl
Daniel del Castillo Iglesias
Ron Heichman
Ramesh Darwishi
ELM
SILM
AAML
13
48
0
05 Sep 2022
ELLE: Efficient Lifelong Pre-training for Emerging Data
ELLE: Efficient Lifelong Pre-training for Emerging Data
Yujia Qin
Jiajie Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
14
67
0
12 Mar 2022
Knowledge Inheritance for Pre-trained Language Models
Knowledge Inheritance for Pre-trained Language Models
Yujia Qin
Yankai Lin
Jing Yi
Jiajie Zhang
Xu Han
...
Yusheng Su
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
VLM
19
49
0
28 May 2021
Extract then Distill: Efficient and Effective Task-Agnostic BERT
  Distillation
Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation
Cheng Chen
Yichun Yin
Lifeng Shang
Zhi Wang
Xin Jiang
Xiao Chen
Qun Liu
FedML
14
7
0
24 Apr 2021
Firefly Neural Architecture Descent: a General Approach for Growing
  Neural Networks
Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks
Lemeng Wu
Bo Liu
Peter Stone
Qiang Liu
51
55
0
17 Feb 2021
On the Transformer Growth for Progressive BERT Training
On the Transformer Growth for Progressive BERT Training
Xiaotao Gu
Liyuan Liu
Hongkun Yu
Jing Li
C. L. P. Chen
Jiawei Han
VLM
61
51
0
23 Oct 2020
Energy-efficient and Robust Cumulative Training with Net2Net
  Transformation
Energy-efficient and Robust Cumulative Training with Net2Net Transformation
Aosong Feng
Priyadarshini Panda
24
1
0
02 Mar 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,815
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1