Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.07857
Cited By
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021
16 April 2021
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
GNN
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning"
50 / 235 papers shown
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Qi Luo
Hengxu Yu
Xiao Li
264
16
0
03 Apr 2024
Exploring the Mystery of Influential Data for Mathematical Reasoning
Xinzhe Ni
Yeyun Gong
Zhibin Gou
Haoran Pan
Yujiu Yang
Nan Duan
Weizhu Chen
227
12
0
01 Apr 2024
Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention
USENIX Annual Technical Conference (USENIX ATC), 2024
Bin Gao
Zhuomin He
Puru Sharma
Qingxuan Kang
Djordje Jevdjic
Junbo Deng
Xingkun Yang
Zhou Yu
Pengfei Zuo
344
108
0
23 Mar 2024
Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference
Baolin Li
Yankai Jiang
V. Gadepally
Devesh Tiwari
241
22
0
19 Mar 2024
VisualCritic: Making LMMs Perceive Visual Quality Like Humans
Zhipeng Huang
Zhizheng Zhang
Yiting Lu
Zheng-Jun Zha
Zhibo Chen
Baining Guo
MLLM
243
15
0
19 Mar 2024
ATOM: Asynchronous Training of Massive Models for Deep Learning in a Decentralized Environment
Xiaofeng Wu
Jia Rao
Wei Chen
208
5
0
15 Mar 2024
Cyclic Data Parallelism for Efficient Parallelism of Deep Neural Networks
Louis Fournier
Edouard Oyallon
218
0
0
13 Mar 2024
ORPO: Monolithic Preference Optimization without Reference Model
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jiwoo Hong
Noah Lee
James Thorne
OSLM
701
441
0
12 Mar 2024
Characterization of Large Language Model Development in the Datacenter
Symposium on Networked Systems Design and Implementation (NSDI), 2024
Qi Hu
Zhisheng Ye
Zerui Wang
Guoteng Wang
Mengdie Zhang
...
Dahua Lin
Xiaolin Wang
Yingwei Luo
Yonggang Wen
Tianwei Zhang
190
104
0
12 Mar 2024
Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
International Symposium on High-Performance Computer Architecture (HPCA), 2024
Hongsun Jang
Jaeyong Song
Jaewon Jung
Jaeyoung Park
Youngsok Kim
Jinho Lee
162
28
0
11 Mar 2024
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
Changyue Liao
Mo Sun
Zihan Yang
Kaiqi Chen
Binhang Yuan
Leilei Gan
Zeke Wang
148
2
0
11 Mar 2024
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning
Yiming Huang
Xiao Liu
Yeyun Gong
Zhibin Gou
Haoran Pan
Nan Duan
Weizhu Chen
AIMat
LRM
362
63
0
04 Mar 2024
DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving
F. Strati
Sara Mcallister
Amar Phanishayee
Jakub Tarnawski
Ana Klimovic
182
50
0
04 Mar 2024
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Zhuohao Yu
Chang Gao
Wenjin Yao
Yidong Wang
Wei Ye
Yongfeng Zhang
Xing Xie
Yue Zhang
Shikun Zhang
246
42
0
23 Feb 2024
SciAgent: Tool-augmented Language Models for Scientific Reasoning
Yubo Ma
Zhibin Gou
Junheng Hao
Ruochen Xu
Shuohang Wang
...
Yujiu Yang
Yixin Cao
Aixin Sun
Hany Awadalla
Weizhu Chen
RALM
LRM
LLMAG
372
47
0
18 Feb 2024
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
International Conference on Learning Representations (ICLR), 2024
Keisuke Kamahori
Tian Tang
Yile Gu
Kan Zhu
Baris Kasikci
448
43
0
10 Feb 2024
ZeroPP: Unleashing Exceptional Parallelism Efficiency through Tensor-Parallelism-Free Methodology
Ding Tang
Lijuan Jiang
Jiecheng Zhou
Minxi Jin
Hengjie Li
Xingcheng Zhang
Zhiling Pei
Jidong Zhai
406
3
0
06 Feb 2024
LLM-Detector: Improving AI-Generated Chinese Text Detection with Open-Source LLM Instruction Tuning
Rongsheng Wang
Hao Chen
Ruizhe Zhou
Han Ma
Yaofei Duan
Yanlan Kang
Songhua Yang
Baoyu Fan
Tao Tan
DeLMO
155
24
0
02 Feb 2024
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives
Suchita Pati
Shaizeen Aga
Mahzabeen Islam
Nuwan Jayasena
Matthew D. Sinclair
144
22
0
30 Jan 2024
HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yongkang Liu
Yiqun Zhang
Qian Li
Tong Liu
Shi Feng
Daling Wang
Yifei Zhang
Hinrich Schütze
294
14
0
26 Jan 2024
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache
Leyang Xue
Yao Fu
Zhan Lu
Luo Mai
Mahesh K. Marina
MoE
332
4
0
25 Jan 2024
LR-CNN: Lightweight Row-centric Convolutional Neural Network Training for Memory Reduction
Zhigang Wang
Hangyu Yang
Ning Wang
Chuanfei Xu
Jie Nie
Zhiqiang Wei
Yu Gu
Ge Yu
188
0
0
21 Jan 2024
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding
Qiaoling Chen
Diandian Gu
Guoteng Wang
Xun Chen
Yingtong Xiong
...
Qi Hu
Xin Jin
Yonggang Wen
Tianwei Zhang
Yang Liu
301
10
0
17 Jan 2024
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Minpeng Liao
Wei Luo
Chengxi Li
Jing Wu
Kai Fan
LRM
303
70
0
16 Jan 2024
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
Cong Guo
Rui Zhang
Jiale Xu
Jingwen Leng
Zihan Liu
...
Minyi Guo
Hao Wu
Shouren Zhao
Junping Zhao
Ke Zhang
VLM
199
30
0
16 Jan 2024
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Weizhou Shen
Chenliang Li
Hongzhan Chen
Ming Yan
Xiaojun Quan
Hehong Chen
Ji Zhang
Fei Huang
LLMAG
374
90
0
14 Jan 2024
Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation
European Conference on Information Retrieval (ECIR), 2024
Eugene Yang
Dawn J Lawrie
J. Mayfield
Douglas W. Oard
Scott Miller
FedML
VLM
222
18
0
09 Jan 2024
Training and Serving System of Foundation Models: A Comprehensive Survey
Jiahang Zhou
Yanyu Chen
Zicong Hong
Wuhui Chen
Yue Yu
Tao Zhang
Hui Wang
Chuan-fu Zhang
Zibin Zheng
ALM
223
14
0
05 Jan 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xiaoyan Cai
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
463
123
0
04 Jan 2024
Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Neural Information Processing Systems (NeurIPS), 2023
Alexander Borzunov
Max Ryabinin
Artem Chumachenko
Dmitry Baranchuk
Tim Dettmers
Younes Belkada
Pavel Samygin
Colin Raffel
MoE
ALM
194
73
0
13 Dec 2023
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Keivan Alizadeh-Vahid
Iman Mirzadeh
Dmitry Belenko
Karen Khatamifard
Minsik Cho
C. C. D. Mundo
Mohammad Rastegari
Mehrdad Farajtabar
274
194
0
12 Dec 2023
vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training
Micro (MICRO), 2023
Jehyeon Bang
Yujeong Choi
Myeongwoo Kim
Yongdeok Kim
Minsoo Rhu
199
29
0
27 Nov 2023
HongTu: Scalable Full-Graph GNN Training on Multiple GPUs (via communication-optimized CPU data offloading)
Qiange Wang
Yao Chen
Weng-Fai Wong
Bingsheng He
GNN
132
27
0
25 Nov 2023
NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments
Proceedings of the VLDB Endowment (PVLDB), 2023
Xin Ai
Qiange Wang
Chunyu Cao
Yanfeng Zhang
Chaoyi Chen
Hao Yuan
Yu Gu
Ge Yu
GNN
202
14
0
22 Nov 2023
Applications of Large Scale Foundation Models for Autonomous Driving
Yu Huang
Yue Chen
Zhu Li
ELM
AI4CE
LRM
ALM
LM&Ro
641
21
0
20 Nov 2023
Zero redundancy distributed learning with differential privacy
Zhiqi Bu
Justin Chiu
Ruixuan Liu
Sheng Zha
George Karypis
243
9
0
20 Nov 2023
Just-in-time Quantization with Processing-In-Memory for Efficient ML Training
M. Ibrahim
Shaizeen Aga
Ada Li
Suchita Pati
Mahzabeen Islam
270
8
0
08 Nov 2023
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
Longteng Zhang
Xiang Liu
Zeyu Li
Xinglin Pan
Peijie Dong
...
Rui Guo
Xin Wang
Qiong Luo
Shaoshuai Shi
Xiaowen Chu
211
12
0
07 Nov 2023
G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations
Haoyang Zhang
Yirui Eric Zhou
Yu Xue
Yiqi Liu
Jian Huang
106
34
0
13 Oct 2023
Rethinking Memory and Communication Cost for Efficient Large Language Model Training
Chan Wu
Hanxiao Zhang
Lin Ju
Jinjing Huang
Youshao Xiao
...
Siyuan Li
Fanzhuang Meng
Lei Liang
Xiaolu Zhang
Jun Zhou
227
7
0
09 Oct 2023
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
International Conference on Learning Representations (ICLR), 2023
Zhibin Gou
Zhihong Shao
Yeyun Gong
Haoran Pan
Yujiu Yang
Shiyu Huang
Nan Duan
Weizhu Chen
LRM
AI4CE
LLMAG
417
258
0
29 Sep 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
S. A. Jacobs
Masahiro Tanaka
Chengming Zhang
Minjia Zhang
L. Song
Samyam Rajbhandari
Yuxiong He
375
178
0
25 Sep 2023
Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates
Symposium on Operating Systems Principles (SOSP), 2023
Insu Jang
Zhenning Yang
Zhen Zhang
Xin Jin
Mosharaf Chowdhury
MoE
AI4CE
OODD
258
78
0
15 Sep 2023
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Hao-Jun Michael Shi
Tsung-Hsien Lee
Shintaro Iwasaki
Jose Gallego-Posada
Zhijing Li
Kaushik Rangadurai
Dheevatsa Mudigere
Michael Rabbat
ODL
244
44
0
12 Sep 2023
Memory Efficient Optimizers with 4-bit States
Neural Information Processing Systems (NeurIPS), 2023
Bingrui Li
Jianfei Chen
Jun Zhu
MQ
331
57
0
04 Sep 2023
Saturn: An Optimized Data System for Large Model Deep Learning Workloads
Proceedings of the VLDB Endowment (PVLDB), 2023
Kabir Nagrecha
Arun Kumar
335
8
0
03 Sep 2023
Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023
Ziming Liu
Shenggan Cheng
Hao Zhou
Yang You
166
53
0
30 Aug 2023
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
International Symposium on Computer Architecture (ISCA), 2023
Ranggi Hwang
Jianyu Wei
Shijie Cao
Changho Hwang
Xiaohu Tang
Ting Cao
Mao Yang
MoE
343
87
0
23 Aug 2023
VeriGen: A Large Language Model for Verilog Code Generation
Shailja Thakur
Baleegh Ahmad
Hammond Pearce
Benjamin Tan
Brendan Dolan-Gavitt
Ramesh Karri
S. Garg
391
278
0
28 Jul 2023
Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses
Proceedings of the VLDB Endowment (PVLDB), 2023
Jeongmin Brian Park
Vikram Sharma Mailthody
Zaid Qureshi
Wen-mei W. Hwu
GNN
271
29
0
28 Jun 2023
Previous
1
2
3
4
5
Next