ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021

16 April 2021

Yuxiong He

Papers citing "ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning"

50 / 235 papers shown

BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

Qi Luo

Hengxu Yu

Xiao Li

264

03 Apr 2024

Exploring the Mystery of Influential Data for Mathematical Reasoning

Yujiu Yang

227

01 Apr 2024

Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttentionUSENIX Annual Technical Conference (USENIX ATC), 2024

344

108

23 Mar 2024

Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference

Baolin Li

Yankai Jiang

V. Gadepally

Devesh Tiwari

241

19 Mar 2024

VisualCritic: Making LMMs Perceive Visual Quality Like Humans

243

19 Mar 2024

ATOM: Asynchronous Training of Massive Models for Deep Learning in a Decentralized Environment

Xiaofeng Wu

Jia Rao

Wei Chen

208

15 Mar 2024

Cyclic Data Parallelism for Efficient Parallelism of Deep Neural Networks

Louis Fournier

Edouard Oyallon

218

13 Mar 2024

ORPO: Monolithic Preference Optimization without Reference ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

701

441

12 Mar 2024

Characterization of Large Language Model Development in the DatacenterSymposium on Networked Systems Design and Implementation (NSDI), 2024

...

Dahua Lin

Xiaolin Wang

Yingwei Luo

Yonggang Wen

Tianwei Zhang

190

104

12 Mar 2024

Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real SystemInternational Symposium on High-Performance Computer Architecture (HPCA), 2024

162

11 Mar 2024

Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Changyue Liao

Mo Sun

Zihan Yang

Kaiqi Chen

Binhang Yuan

Leilei Gan

Zeke Wang

148

11 Mar 2024

Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning

362

04 Mar 2024

DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving

182

04 Mar 2024

KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

Xing Xie

246

23 Feb 2024

SciAgent: Tool-augmented Language Models for Scientific Reasoning

...

Yujiu Yang

372

18 Feb 2024

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts ModelsInternational Conference on Learning Representations (ICLR), 2024

448

10 Feb 2024

ZeroPP: Unleashing Exceptional Parallelism Efficiency through Tensor-Parallelism-Free Methodology

406

06 Feb 2024

LLM-Detector: Improving AI-Generated Chinese Text Detection with Open-Source LLM Instruction Tuning

155

02 Feb 2024

T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives

Suchita Pati

Shaizeen Aga

Mahzabeen Islam

Nuwan Jayasena

Matthew D. Sinclair

144

30 Jan 2024

HiFT: A Hierarchical Full Parameter Fine-Tuning StrategyConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Shi Feng

294

26 Jan 2024

MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache

332

25 Jan 2024

LR-CNN: Lightweight Row-centric Convolutional Neural Network Training for Memory Reduction

188

21 Jan 2024

InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding

...

Xin Jin

301

17 Jan 2024

MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible PipelineAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

303

16 Jan 2024

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory StitchingInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024

Cong Guo

Rui Zhang

Jiale Xu

Jingwen Leng

Zihan Liu

...

199

16 Jan 2024

Small LLMs Are Weak Tool Learners: A Multi-LLM AgentConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Ming Yan

Ji Zhang

Fei Huang

LLMAG

374

14 Jan 2024

Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and DistillationEuropean Conference on Information Retrieval (ECIR), 2024

222

09 Jan 2024

Training and Serving System of Foundation Models: A Comprehensive Survey

223

05 Jan 2024

Understanding LLMs: A Comprehensive Overview from Training to Inference

...

Tuo Zhang

Tianming Liu

463

123

04 Jan 2024

Distributed Inference and Fine-tuning of Large Language Models Over The InternetNeural Information Processing Systems (NeurIPS), 2023

Tim Dettmers

194

13 Dec 2023

LLM in a flash: Efficient Large Language Model Inference with Limited MemoryAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Keivan Alizadeh-Vahid

Minsik Cho

274

194

12 Dec 2023

vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model TrainingMicro (MICRO), 2023

199

27 Nov 2023

HongTu: Scalable Full-Graph GNN Training on Multiple GPUs (via communication-optimized CPU data offloading)

132

25 Nov 2023

NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous EnvironmentsProceedings of the VLDB Endowment (PVLDB), 2023

202

22 Nov 2023

Applications of Large Scale Foundation Models for Autonomous Driving

Yu Huang

Yue Chen

Zhu Li

ELM AI4CE LRM ALM LM&Ro

641

20 Nov 2023

Zero redundancy distributed learning with differential privacy

Zhiqi Bu

Justin Chiu

Ruixuan Liu

Sheng Zha

George Karypis

243

20 Nov 2023

Just-in-time Quantization with Processing-In-Memory for Efficient ML Training

M. Ibrahim

Shaizeen Aga

Ada Li

Suchita Pati

Mahzabeen Islam

270

08 Nov 2023

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

...

211

07 Nov 2023

G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations

106

13 Oct 2023

Rethinking Memory and Communication Cost for Efficient Large Language Model Training

...

Lei Liang

227

09 Oct 2023

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingInternational Conference on Learning Representations (ICLR), 2023

Zhihong Shao

Yujiu Yang

417

258

29 Sep 2023

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Yuxiong He

375

178

25 Sep 2023

Oobleck: Resilient Distributed Training of Large Models Using Pipeline TemplatesSymposium on Operating Systems Principles (SOSP), 2023

Xin Jin

258

15 Sep 2023

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

244

12 Sep 2023

Memory Efficient Optimizers with 4-bit StatesNeural Information Processing Systems (NeurIPS), 2023

Bingrui Li

Jianfei Chen

Jun Zhu

331

04 Sep 2023

Saturn: An Optimized Data System for Large Model Deep Learning WorkloadsProceedings of the VLDB Endowment (PVLDB), 2023

Kabir Nagrecha

Arun Kumar

335

03 Sep 2023

Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training EfficiencyInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023

Ziming Liu

Shenggan Cheng

Hao Zhou

Yang You

166

30 Aug 2023

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert InferenceInternational Symposium on Computer Architecture (ISCA), 2023

343

23 Aug 2023

VeriGen: A Large Language Model for Verilog Code Generation

391

278

28 Jul 2023

Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage AccessesProceedings of the VLDB Endowment (PVLDB), 2023

Jeongmin Brian Park

Vikram Sharma Mailthody

Zaid Qureshi

Wen-mei W. Hwu

GNN

271

28 Jun 2023