ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.13878
  4. Cited By
Galvatron: Efficient Transformer Training over Multiple GPUs Using
  Automatic Parallelism

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

25 November 2022
Xupeng Miao
Yujie Wang
Youhe Jiang
Chunan Shi
Xiaonan Nie
Hailin Zhang
Bin Cui
    GNN
    MoE
ArXivPDFHTML

Papers citing "Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism"

32 / 32 papers shown
Title
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training
Xinyi Liu
Y. Wang
Shenhan Zhu
Fangcheng Fu
Qingshuo Liu
Guangming Lin
Bin Cui
GNN
97
0
0
30 Apr 2025
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
Yinmin Zhong
Zili Zhang
Xiaoniu Song
Hanpeng Hu
Chao Jin
...
Changyi Wan
Hongyu Zhou
Yimin Jiang
Yibo Zhu
Daxin Jiang
OffRL
AI4TS
57
0
0
22 Apr 2025
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
Mingyu Liang
Hiwot Tadese Kassa
Wenyin Fu
Brian Coutinho
Louis Feng
Christina Delimitrou
21
0
0
12 Apr 2025
Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving
Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving
Shihong Gao
X. Zhang
Yanyan Shen
Lei Chen
22
1
0
10 Apr 2025
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
Zhanda Zhu
Christina Giannoula
Muralidhar Andoorveedu
Qidong Su
Karttikeya Mangalam
Bojian Zheng
Gennady Pekhimenko
VLM
MoE
49
0
0
24 Mar 2025
Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach
Ruifeng She
Bowen Pang
Kai Li
Zehua Liu
Tao Zhong
61
0
0
12 Mar 2025
iServe: An Intent-based Serving System for LLMs
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
122
0
0
08 Jan 2025
FlexSP: Accelerating Large Language Model Training via Flexible Sequence Parallelism
FlexSP: Accelerating Large Language Model Training via Flexible Sequence Parallelism
Y. Wang
Shiju Wang
Shenhan Zhu
Fangcheng Fu
Xinyi Liu
Xuefeng Xiao
Huixia Li
Jiashi Li
Faming Wu
Bin Cui
83
3
0
02 Dec 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
154
3
0
20 Nov 2024
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer
  Models
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models
Runsheng Benson Guo
Utkarsh Anand
Arthur Chen
Khuzaima Daudjee
42
1
0
01 Nov 2024
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale
  Models via Malleable Data and Model Parallelization
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization
Haoyang Li
Fangcheng Fu
Hao Ge
Sheng Lin
Xuanyu Wang
Jiawen Niu
Y. Wang
Hailin Zhang
Xiaonan Nie
Bin Cui
MoMe
31
2
0
17 Oct 2024
EinDecomp: Decomposition of Declaratively-Specified Machine Learning and
  Numerical Computations for Parallel Execution
EinDecomp: Decomposition of Declaratively-Specified Machine Learning and Numerical Computations for Parallel Execution
Daniel Bourgeois
Zhimin Ding
Dimitrije Jankov
Jiehui Li
Mahmoud Sleem
Yuxin Tang
Jiawen Yao
Xinyu Yao
Chris Jermaine
23
0
0
03 Oct 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
71
8
0
29 Jul 2024
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Ke Cheng
Wen Hu
Zhi Wang
Hongen Peng
Jianguo Li
Sheng Zhang
45
7
0
19 Jun 2024
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU
  Heterogeneity
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Tyler Griggs
Xiaoxuan Liu
Jiaxiang Yu
Doyoung Kim
Wei-Lin Chiang
Alvin Cheung
Ion Stoica
42
15
0
22 Apr 2024
Navigating the Landscape of Large Language Models: A Comprehensive
  Review and Analysis of Paradigms and Fine-Tuning Strategies
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Benjue Weng
LM&MA
35
7
0
13 Apr 2024
Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible
  Instances
Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances
Jiangfei Duan
Ziang Song
Xupeng Miao
Xiaoli Xi
Dahua Lin
Harry Xu
Minjia Zhang
Zhihao Jia
31
10
0
21 Mar 2024
InternEvo: Efficient Long-sequence Large Language Model Training via
  Hybrid Parallelism and Redundant Sharding
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding
Qiaoling Chen
Diandian Gu
Guoteng Wang
Xun Chen
Yingtong Xiong
...
Qi Hu
Xin Jin
Yonggang Wen
Tianwei Zhang
Peng Sun
44
8
0
17 Jan 2024
Training and Serving System of Foundation Models: A Comprehensive Survey
Training and Serving System of Foundation Models: A Comprehensive Survey
Jiahang Zhou
Yanyu Chen
Zicong Hong
Wuhui Chen
Yue Yu
Tao Zhang
Hui Wang
Chuan-fu Zhang
Zibin Zheng
ALM
32
5
0
05 Jan 2024
Towards Efficient Generative Large Language Model Serving: A Survey from
  Algorithms to Systems
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
61
76
0
23 Dec 2023
Understanding the Potential of FPGA-Based Spatial Acceleration for Large
  Language Model Inference
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen
Jiahao Zhang
Yixiao Du
Shaojie Xiang
Zichao Yue
Niansong Zhang
Yaohui Cai
Zhiru Zhang
48
34
0
23 Dec 2023
SpotServe: Serving Generative Large Language Models on Preemptible
  Instances
SpotServe: Serving Generative Large Language Models on Preemptible Instances
Xupeng Miao
Chunan Shi
Jiangfei Duan
Xiaoli Xi
Dahua Lin
Bin Cui
Zhihao Jia
VLM
19
53
0
27 Nov 2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative
  Model Inference with Unstructured Sparsity
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Haojun Xia
Zhen Zheng
Yuchao Li
Donglin Zhuang
Zhongzhu Zhou
Xiafei Qiu
Yong Li
Wei Lin
S. Song
51
11
0
19 Sep 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
26
1
0
31 Jul 2023
Improving Automatic Parallel Training via Balanced Memory Workload
  Optimization
Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Yujie Wang
Youhe Jiang
Xupeng Miao
Fangcheng Fu
Shenhan Zhu
Xiaonan Nie
Yaofeng Tu
Bin Cui
32
9
0
05 Jul 2023
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang
Fangcheng Fu
Xupeng Miao
Xiaonan Nie
Bin Cui
16
11
0
17 May 2023
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via
  Dynamic Device Placement
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Xiaonan Nie
Xupeng Miao
Zilong Wang
Zichao Yang
Jilong Xue
Lingxiao Ma
Gang-Ming Cao
Bin Cui
MoE
30
44
0
08 Apr 2023
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in
  Tencent
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Xiaonan Nie
Yi Liu
Fangcheng Fu
J. Xue
Dian Jiao
Xupeng Miao
Yangyu Tao
Bin Cui
MoE
19
16
0
06 Mar 2023
Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by
  Adaptive Group-Scheduling for Micro-Batches
Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by Adaptive Group-Scheduling for Micro-Batches
Siyu Wang
Zongyan Cao
Chang Si
Lansong Diao
Jiamang Wang
W. Lin
24
0
0
03 Mar 2023
Quantized Distributed Training of Large Models with Convergence
  Guarantees
Quantized Distributed Training of Large Models with Convergence Guarantees
I. Markov
Adrian Vladu
Qi Guo
Dan Alistarh
MQ
23
11
0
05 Feb 2023
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,774
0
24 Feb 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
1