Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2208.08124
Cited By
Boosting Distributed Training Performance of the Unpadded BERT Model
17 August 2022
Jinle Zeng
Min Li
Zhihua Wu
Jiaqi Liu
Yuang Liu
Dianhai Yu
Yanjun Ma
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Boosting Distributed Training Performance of the Unpadded BERT Model"
9 / 9 papers shown
Title
WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training
Z. Wang
Anna Cai
Xinfeng Xie
Zaifeng Pan
Yue Guan
...
Shikai Li
Jianyu Huang
Chris Cai
Yuchen Hao
Yufei Ding
39
2
0
23 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu-Xi Cheng
MoE
61
1
0
07 Mar 2025
One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings
Andrea Gurioli
Federico Pennino
João Monteiro
Maurizio Gabbrielli
46
0
0
04 Mar 2025
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
88
72
0
18 Dec 2024
PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training
Haoran Xu
Ziqian Liu
Rong Fu
Zhongling Su
Zerui Wang
Zheng Cai
Zhilin Pei
Xingcheng Zhang
Mamba
27
0
0
07 Aug 2024
SoD
2
^2
2
: Statically Optimizing Dynamic Deep Neural Network
Wei Niu
Gagan Agrawal
Bin Ren
25
4
0
29 Feb 2024
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Jacob P. Portes
Alex Trott
Sam Havens
Daniel King
Abhinav Venigalla
Moin Nadeem
Nikhil Sardana
D. Khudia
Jonathan Frankle
13
16
0
29 Dec 2023
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Yujia Zhai
Chengquan Jiang
Leyuan Wang
Xiaoying Jia
Shang Zhang
Zizhong Chen
Xin Liu
Yibo Zhu
62
48
0
06 Oct 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,815
0
17 Sep 2019
1