ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.07857
  4. Cited By
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep
  Learning

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021
16 April 2021
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
    GNN
ArXiv (abs)PDFHTML

Papers citing "ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning"

50 / 235 papers shown
Title
Deep Optimizer States: Towards Scalable Training of Transformer Models
  Using Interleaved Offloading
Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved OffloadingInternational Middleware Conference (Middleware), 2024
Avinash Maurya
Jie Ye
M. Rafique
Franck Cappello
Bogdan Nicolae
137
7
0
26 Oct 2024
Markov Chain of Thought for Efficient Mathematical Reasoning
Markov Chain of Thought for Efficient Mathematical ReasoningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Wen Yang
Kai Fan
Minpeng Liao
LRM
139
11
0
23 Oct 2024
FlowTracer: A Tool for Uncovering Network Path Usage Imbalance in AI
  Training Clusters
FlowTracer: A Tool for Uncovering Network Path Usage Imbalance in AI Training Clusters
Hasibul Jamil
Abdul Alim
L. Schares
P. Maniotis
L. Schour
Ali Sydney
Abdullah Kayi
T. Kosar
Bengi Karacali
103
0
0
22 Oct 2024
Understanding and Alleviating Memory Consumption in RLHF for LLMs
Understanding and Alleviating Memory Consumption in RLHF for LLMs
Jin Zhou
Hanmei Yang
Steven
Tang
Mingcan Xiang
Hui Guan
Tongping Liu
179
0
0
21 Oct 2024
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation
Alchemy: Amplifying Theorem-Proving Capability through Symbolic MutationInternational Conference on Learning Representations (ICLR), 2024
Shaonan Wu
Shuai Lu
Yeyun Gong
Nan Duan
Ping Wei
AIMat
271
1
0
21 Oct 2024
TiMePReSt: Time and Memory Efficient Pipeline Parallel DNN Training with
  Removed Staleness
TiMePReSt: Time and Memory Efficient Pipeline Parallel DNN Training with Removed Staleness
Ankita Dutta
Nabendu Chaki
Rajat K. De
227
2
0
18 Oct 2024
Breaking the Memory Wall for Heterogeneous Federated Learning via Model
  Splitting
Breaking the Memory Wall for Heterogeneous Federated Learning via Model SplittingIEEE Transactions on Parallel and Distributed Systems (TPDS), 2024
Chunlin Tian
Li Li
Kahou Tam
Yebo Wu
Chengzhong Xu
FedML
196
7
0
12 Oct 2024
Language Imbalance Driven Rewarding for Multilingual Self-improving
Language Imbalance Driven Rewarding for Multilingual Self-improvingInternational Conference on Learning Representations (ICLR), 2024
Wen Yang
Junhong Wu
Chen Wang
Chengqing Zong
J.N. Zhang
ALMLRM
474
21
0
11 Oct 2024
Learning Evolving Tools for Large Language Models
Learning Evolving Tools for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Guoxin Chen
Zhong Zhang
Xin Cong
Fangda Guo
Yesai Wu
Yankai Lin
Wenzheng Feng
Yasheng Wang
KELM
489
5
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language ModelsIEEE Circuits and Systems Magazine (IEEE CSM), 2024
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
169
17
0
08 Oct 2024
EinDecomp: Decomposition of Declaratively-Specified Machine Learning and
  Numerical Computations for Parallel Execution
EinDecomp: Decomposition of Declaratively-Specified Machine Learning and Numerical Computations for Parallel ExecutionProceedings of the VLDB Endowment (PVLDB), 2024
Daniel Bourgeois
Zhimin Ding
Dimitrije Jankov
Jiehui Li
Mahmoud Sleem
Yuxin Tang
Jiawen Yao
Xinyu Yao
Chris Jermaine
80
2
0
03 Oct 2024
PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training
PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training
Daiyaan Arfeen
Zhen Zhang
Xinwei Fu
G. R. Ganger
Yida Wang
AI4CE
113
2
0
23 Sep 2024
Achieving Peak Performance for Large Language Models: A Systematic
  Review
Achieving Peak Performance for Large Language Models: A Systematic ReviewIEEE Access (IEEE Access), 2024
Z. R. K. Rostam
Sándor Szénási
Gábor Kertész
280
18
0
07 Sep 2024
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale
  Model-in-Network Data-Parallel Training on Distributed GPUs
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
Mo Sun
Zihan Yang
Changyue Liao
Yingtao Li
Leilei Gan
Zeke Wang
279
3
0
02 Sep 2024
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts CriticAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Xin Zheng
Jie Lou
Boxi Cao
Xueru Wen
Yuqiu Ji
Hongyu Lin
Yaojie Lu
Xianpei Han
Debing Zhang
Le Sun
OffRLLRMLLMAGReLMKELM
465
22
1
29 Aug 2024
Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous
  GPU Clusters
Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU ClustersAAAI Conference on Artificial Intelligence (AAAI), 2024
WenZheng Zhang
Yang Hu
Jing Shi
Xiaoying Bai
129
5
0
22 Aug 2024
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing
  Hallucinations in LVLMs
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMsEuropean Conference on Computer Vision (ECCV), 2024
Yassine Ouali
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
VLMMLLM
244
40
0
19 Aug 2024
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for
  Efficient MoE Inference
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE InferenceInternational Conference on Computer Aided Design (ICCAD), 2024
Shuzhang Zhong
Ling Liang
Yuan Wang
Runsheng Wang
Ru Huang
Meng Li
MoE
143
28
0
19 Aug 2024
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster
  Scheduling
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
Xinyi Zhang
Hanyu Zhao
Wencong Xiao
Chencan Wu
Fei Xu
Yong Li
Wei Lin
Fangming Liu
118
4
0
16 Aug 2024
MoC-System: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training
MoC-System: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model TrainingInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
Weilin Cai
Le Qin
Jiayi Huang
MoE
112
0
0
08 Aug 2024
Making Long-Context Language Models Better Multi-Hop Reasoners
Making Long-Context Language Models Better Multi-Hop ReasonersAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yanyang Li
Shuo Liang
Michael R. Lyu
Liwei Wang
LLMAGLRM
231
24
0
06 Aug 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Yang Liu
307
28
0
29 Jul 2024
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long
  Sequences Training
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Cheng Luo
Jiawei Zhao
Zhuoming Chen
Beidi Chen
A. Anandkumar
208
5
0
22 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
469
134
0
09 Jul 2024
Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis
Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis
Xinyu Lian
S. A. Jacobs
Lev Kurilenko
Masahiro Tanaka
Stas Bekman
Olatunji Ruwase
Minjia Zhang
OffRL
268
10
0
27 Jun 2024
A Survey on Mixture of Experts in Large Language Models
A Survey on Mixture of Experts in Large Language Models
Weilin Cai
Juyong Jiang
Fan Wang
Jing Tang
Sunghun Kim
Jiayi Huang
MoE
382
171
0
26 Jun 2024
Adam-mini: Use Fewer Learning Rates To Gain More
Adam-mini: Use Fewer Learning Rates To Gain More
Yushun Zhang
Congliang Chen
Ziniu Li
Tian Ding
Chenwei Wu
Yinyu Ye
Zhi-Quan Luo
Tian Ding
366
83
0
24 Jun 2024
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing
  Backpropagation
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
Yuchen Yang
Yingdong Shi
Cheems Wang
Xiantong Zhen
Yuxuan Shi
Jun Xu
221
3
0
24 Jun 2024
FastPersist: Accelerating Model Checkpointing in Deep Learning
FastPersist: Accelerating Model Checkpointing in Deep Learning
Guanhua Wang
Olatunji Ruwase
Bing Xie
Yuxiong He
133
14
0
19 Jun 2024
Step-level Value Preference Optimization for Mathematical Reasoning
Step-level Value Preference Optimization for Mathematical Reasoning
Guoxin Chen
Minpeng Liao
Chengxi Li
Kai Fan
LRM
183
64
0
16 Jun 2024
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory
  Utilization for Hybrid CPU-GPU Offloaded Optimizers
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
Avinash Maurya
Jie Ye
M. Rafique
Franck Cappello
Bogdan Nicolae
148
7
0
15 Jun 2024
DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language
  Models
DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language Models
Avinash Maurya
Robert Underwood
M. Rafique
Franck Cappello
Bogdan Nicolae
217
36
0
15 Jun 2024
Practical offloading for fine-tuning LLM on commodity GPU via learned sparse projectors
Practical offloading for fine-tuning LLM on commodity GPU via learned sparse projectorsAAAI Conference on Artificial Intelligence (AAAI), 2024
Siyuan Chen
Zelong Guan
Yudong Liu
Phillip B. Gibbons
Phillip B. Gibbons
74
3
0
14 Jun 2024
Optimizing Large Model Training through Overlapped Activation Recomputation
Optimizing Large Model Training through Overlapped Activation Recomputation
Ping Chen
Wenjie Zhang
Shuibing He
Yingjie Gu
Zhuwei Peng
...
Yi Zheng
Zhefeng Wang
Yanlong Yin
Gang Chen
Gang Chen
376
8
0
13 Jun 2024
ProTrain: Efficient LLM Training via Memory-Aware Techniques
ProTrain: Efficient LLM Training via Memory-Aware Techniques
Hanmei Yang
Jin Zhou
Yao Fu
Xiaoqun Wang
Ramine Roane
Hui Guan
Tongping Liu
VLM
184
3
0
12 Jun 2024
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel
  Fusion
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
Li-Wen Chang
Yiyuan Ma
Qi Hou
Chengquan Jiang
Ningxin Zheng
...
Zuquan Song
Ziheng Jiang
Yanghua Peng
Xuanzhe Liu
Xin Liu
245
43
0
11 Jun 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang
Shiyin Lu
Yang Li
Yanqing Ma
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
VLM
268
17
0
05 Jun 2024
A Study of Optimizations for Fine-tuning Large Language Models
A Study of Optimizations for Fine-tuning Large Language Models
Arjun Singh
Nikhil Pandey
Anup Shirgaonkar
Pavan Manoj
Vijay Aski
161
10
0
04 Jun 2024
Outliers and Calibration Sets have Diminishing Effect on Quantization of
  Modern LLMs
Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs
Davide Paglieri
Saurabh Dash
Tim Rocktaschel
Jack Parker-Holder
MQ
225
8
0
31 May 2024
MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models
MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models
Taehyun Kim
Kwanseok Choi
Youngmock Cho
Jaehoon Cho
Hyukzae Lee
Jaewoong Sim
MoE
149
9
0
29 May 2024
2BP: 2-Stage Backpropagation
2BP: 2-Stage Backpropagation
Christopher Rae
Joseph K. L. Lee
James Richings
MoEMQ
91
0
0
28 May 2024
TURNIP: A "Nondeterministic" GPU Runtime with CPU RAM Offload
TURNIP: A "Nondeterministic" GPU Runtime with CPU RAM Offload
Zhimin Ding
Jiawen Yao
Brianna Barrow
Tania Lorido-Botran
Christopher M. Jermaine
Yu-Shuen Tang
Jiehui Li
Xinyu Yao
Sleem Mahmoud Abdelghafar
Daniel Bourgeois
88
2
0
25 May 2024
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs
  Amid Failures
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid FailuresSymposium on Operating Systems Principles (SOSP), 2024
Swapnil Gandhi
Mark Zhao
Athinagoras Skiadopoulos
Christos Kozyrakis
AI4CEGNN
160
1
0
22 May 2024
AlphaMath Almost Zero: process Supervision without process
AlphaMath Almost Zero: process Supervision without processNeural Information Processing Systems (NeurIPS), 2024
Guoxin Chen
Minpeng Liao
Chengxi Li
Kai Fan
AIMatLRM
232
168
0
06 May 2024
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines
DiffusionPipe: Training Large Diffusion Models with Efficient PipelinesConference on Machine Learning and Systems (MLSys), 2024
Ye Tian
Zhen Jia
Ziyue Luo
Yida Wang
Chuan Wu
AI4CE
134
4
0
02 May 2024
Sequence Length Scaling in Vision Transformers for Scientific Images on
  Frontier
Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier
A. Tsaris
Chengming Zhang
Xiao Wang
Junqi Yin
Siyan Liu
...
Jong Youl Choi
Mohamed Wahib
Dan Lu
Dali Wang
Feiyi Wang
98
1
0
17 Apr 2024
Balancing Speciality and Versatility: a Coarse to Fine Framework for
  Supervised Fine-tuning Large Language Model
Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model
Hengyuan Zhang
Yanru Wu
Dawei Li
Zacc Yang
Rui Zhao
Yong Jiang
Fei Tan
ALM
400
1
0
16 Apr 2024
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
Noah Lewis
J. L. Bez
Suren Byna
411
4
0
16 Apr 2024
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation
  of Large Language Models
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
Zhuohao Yu
Chang Gao
Wenjin Yao
Yidong Wang
Zhengran Zeng
Wei Ye
Yongfeng Zhang
Yue Zhang
Shikun Zhang
188
6
0
09 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
462
21
0
07 Apr 2024
Previous
12345
Next