ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.13240
  4. Cited By
Token Dropping for Efficient BERT Pretraining

Token Dropping for Efficient BERT Pretraining

24 March 2022
Le Hou
Richard Yuanzhe Pang
Tianyi Zhou
Yuexin Wu
Xinying Song
Xiaodan Song
Denny Zhou
ArXivPDFHTML

Papers citing "Token Dropping for Efficient BERT Pretraining"

7 / 7 papers shown
Title
Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning
Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning
Yeongbin Seo
Dongha Lee
Jinyoung Yeo
CLL
KELM
69
1
0
24 Jul 2024
A Multi-Level Framework for Accelerating Training Transformer Models
A Multi-Level Framework for Accelerating Training Transformer Models
Longwei Zou
Han Zhang
Yangdong Deng
AI4CE
24
1
0
07 Apr 2024
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Qihuang Zhong
Liang Ding
Juhua Liu
Xuebo Liu
Min Zhang
Bo Du
Dacheng Tao
VLM
27
9
0
24 May 2023
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Haw-Shiuan Chang
Ruei-Yao Sun
Kathryn Ricci
Andrew McCallum
27
14
0
10 Oct 2022
Carbon Emissions and Large Neural Network Training
Carbon Emissions and Large Neural Network Training
David A. Patterson
Joseph E. Gonzalez
Quoc V. Le
Chen Liang
Lluís-Miquel Munguía
D. Rothchild
David R. So
Maud Texier
J. Dean
AI4CE
239
626
0
21 Apr 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1