ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.14450
  4. Cited By
Maximizing Parallelism in Distributed Training for Huge Neural Networks

Maximizing Parallelism in Distributed Training for Huge Neural Networks

30 May 2021
Zhengda Bian
Qifan Xu
Boxiang Wang
Yang You
    MoE
ArXiv (abs)PDFHTML

Papers citing "Maximizing Parallelism in Distributed Training for Huge Neural Networks"

28 / 28 papers shown
Title
TrainVerify: Equivalence-Based Verification for Distributed LLM Training
TrainVerify: Equivalence-Based Verification for Distributed LLM Training
Yunchi Lu
Youshan Miao
Cheng Tan
Peng Huang
Yi Zhu
Xian Zhang
Fan Yang
LRM
26
0
0
19 Jun 2025
You Don't Need All Attentions: Distributed Dynamic Fine-Tuning for Foundation Models
You Don't Need All Attentions: Distributed Dynamic Fine-Tuning for Foundation Models
Shiwei Ding
Lan Zhang
Zhenlin Wang
Giuseppe Ateniese
Xiaoyong Yuan
68
0
0
16 Apr 2025
A Survey on Memory-Efficient Large-Scale Model Training in AI for Science
A Survey on Memory-Efficient Large-Scale Model Training in AI for Science
Kaiyuan Tian
Linbo Qiao
Baihui Liu
Gongqingjian Jiang
Dongsheng Li
106
0
0
21 Jan 2025
NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor
  Parallelism
NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism
Xin Ai
Hao Yuan
Zeyu Ling
Qiange Wang
Yanfeng Zhang
Zhenbo Fu
Chaoyi Chen
Yu Gu
Ge Yu
GNN
156
1
0
29 Dec 2024
Comprehensive Performance Modeling and System Design Insights for
  Foundation Models
Comprehensive Performance Modeling and System Design Insights for Foundation Models
Shashank Subramanian
Ermal Rrapaj
Peter Harrington
Smeet Chheda
S. Farrell
Brian Austin
Samuel Williams
N. Wright
W. Bhimji
81
0
0
30 Sep 2024
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale
  Model-in-Network Data-Parallel Training on Distributed GPUs
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
Mo Sun
Zihan Yang
Changyue Liao
Yingtao Li
Leilei Gan
Zeke Wang
108
1
0
02 Sep 2024
Mixed Sparsity Training: Achieving 4$\times$ FLOP Reduction for
  Transformer Pretraining
Mixed Sparsity Training: Achieving 4×\times× FLOP Reduction for Transformer Pretraining
Pihe Hu
Shaolong Li
Longbo Huang
55
0
0
21 Aug 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
154
13
0
29 Jul 2024
LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs
LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs
Taeho Kim
Yanming Wang
Vatshank Chaturvedi
Lokesh Gupta
Seyeon Kim
Yongin Kwon
Sangtae Ha
53
5
0
16 Apr 2024
Optimizing Malware Detection in IoT Networks: Leveraging Resource-Aware
  Distributed Computing for Enhanced Security
Optimizing Malware Detection in IoT Networks: Leveraging Resource-Aware Distributed Computing for Enhanced Security
Sreenitha Kasarapu
Sanket Shukla
Sai Manoj P D
37
0
0
12 Apr 2024
Enhancing IoT Malware Detection through Adaptive Model Parallelism and
  Resource Optimization
Enhancing IoT Malware Detection through Adaptive Model Parallelism and Resource Optimization
Sreenitha Kasarapu
Sanket Shukla
Sai Manoj P D
42
1
0
12 Apr 2024
ZeroPP: Unleashing Exceptional Parallelism Efficiency through
  Tensor-Parallelism-Free Methodology
ZeroPP: Unleashing Exceptional Parallelism Efficiency through Tensor-Parallelism-Free Methodology
Ding Tang
Lijuan Jiang
Jiecheng Zhou
Minxi Jin
Hengjie Li
Xingcheng Zhang
Zhiling Pei
Jidong Zhai
96
3
0
06 Feb 2024
Training and Serving System of Foundation Models: A Comprehensive Survey
Training and Serving System of Foundation Models: A Comprehensive Survey
Jiahang Zhou
Yanyu Chen
Zicong Hong
Wuhui Chen
Yue Yu
Tao Zhang
Hui Wang
Chuan-fu Zhang
Zibin Zheng
ALM
94
11
0
05 Jan 2024
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
121
24
0
01 Dec 2023
Rethinking Memory and Communication Cost for Efficient Large Language
  Model Training
Rethinking Memory and Communication Cost for Efficient Large Language Model Training
Chan Wu
Hanxiao Zhang
Lin Ju
Jinjing Huang
Youshao Xiao
...
Siyuan Li
Fanzhuang Meng
Lei Liang
Xiaolu Zhang
Jun Zhou
45
4
0
09 Oct 2023
DistSim: A performance model of large-scale hybrid distributed DNN
  training
DistSim: A performance model of large-scale hybrid distributed DNN training
Guandong Lu
Run Chen
Yakai Wang
Yangjie Zhou
Rui Zhang
...
Yanming Miao
Zhifang Cai
Li-Wei Li
Jingwen Leng
Minyi Guo
86
12
0
14 Jun 2023
A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs
A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs
Siddharth Singh
Prajwal Singhania
Aditya K. Ranjan
Zack Sating
A. Bhatele
79
4
0
22 May 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
105
43
0
07 Apr 2023
OCCL: a Deadlock-free Library for GPU Collective Communication
OCCL: a Deadlock-free Library for GPU Collective Communication
Lichen Pan
Juncheng Liu
Jinhui Yuan
Rongkai Zhang
Pengze Li
Zhen Xiao
35
1
0
11 Mar 2023
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize
  Mixture-of-Experts Training
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training
Siddharth Singh
Olatunji Ruwase
A. A. Awan
Samyam Rajbhandari
Yuxiong He
A. Bhatele
MoE
103
36
0
11 Mar 2023
Colossal-Auto: Unified Automation of Parallelization and Activation
  Checkpoint for Large-scale Models
Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Yuliang Liu
Shenggui Li
Jiarui Fang
Yan Shao
Boyuan Yao
Yang You
OffRL
83
7
0
06 Feb 2023
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth
  Cost
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth Cost
Jinfan Chen
Shigang Li
Ran Guo
Jinhui Yuan
Torsten Hoefler
57
2
0
17 Jan 2023
TAPS: Topology-Aware Intra-Operator Parallelism Strategy Searching
  Algorithm for Deep Neural Networks
TAPS: Topology-Aware Intra-Operator Parallelism Strategy Searching Algorithm for Deep Neural Networks
Peng Liang
Hao Zheng
Teng Su
Linbo Qiao
Dongsheng Li
48
0
0
11 Jan 2023
Dive into Big Model Training
Dive into Big Model Training
Qinghua Liu
Yuxiang Jiang
MoMeAI4CELRM
33
3
0
25 Jul 2022
Merak: An Efficient Distributed DNN Training Framework with Automated 3D
  Parallelism for Giant Foundation Models
Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
Zhiquan Lai
Shengwei Li
Xudong Tang
Ke-shi Ge
Weijie Liu
Yabo Duan
Linbo Qiao
Dongsheng Li
91
46
0
10 Jun 2022
FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient
  Training of Deep Learning Models
FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient Training of Deep Learning Models
Yunzhuo Liu
Bo Jiang
Tian Guo
Zimeng Huang
Wen-ping Ma
Xinbing Wang
Chenghu Zhou
73
9
0
28 Apr 2022
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel
  Training
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
Yongbin Li
Hongxin Liu
Zhengda Bian
Boxiang Wang
Haichen Huang
Fan Cui
Chuan-Qing Wang
Yang You
GNN
111
149
0
28 Oct 2021
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based
  Memory Management
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
Jiarui Fang
Zilin Zhu
Shenggui Li
Hui Su
Yang Yu
Jie Zhou
Yang You
VLM
105
25
0
12 Aug 2021
1