Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.05645
Cited By
v1
v2
v3
v4
v5 (latest)
Training Large Neural Networks with Constant Memory using a New Execution Algorithm
13 February 2020
B. Pudipeddi
Maral Mesmakhosroshahi
Jinwen Xi
S. Bharadwaj
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Training Large Neural Networks with Constant Memory using a New Execution Algorithm"
17 / 17 papers shown
Title
SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs
Jinwoo Park
Seunggeun Cho
Dongsu Han
71
0
0
16 May 2025
GPU Memory Usage Optimization for Backward Propagation in Deep Network Training
Ding-Yong Hong
Tzu-Hsien Tsai
Ning Wang
Pangfeng Liu
Jan-Jan Wu
112
0
0
18 Feb 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
152
0
0
10 Jan 2025
Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
Hongsun Jang
Jaeyong Song
Jaewon Jung
Jaeyoung Park
Youngsok Kim
Jinho Lee
43
16
0
11 Mar 2024
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models
Alan Chan
Ben Bucknall
Herbie Bradley
David M. Krueger
63
6
0
22 Dec 2023
Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
Yijia Zhang
Yibo Han
Shijie Cao
Guohao Dai
Youshan Miao
Ting Cao
Fan Yang
Ningyi Xu
59
4
0
31 May 2023
Systems for Parallel and Distributed Large-Model Deep Learning Training
Kabir Nagrecha
GNN
VLM
MoE
74
7
0
06 Jan 2023
Elixir: Train a Large Language Model on a Small GPU Cluster
Haichen Huang
Jiarui Fang
Hongxin Liu
Shenggui Li
Yang You
VLM
79
7
0
10 Dec 2022
Petals: Collaborative Inference and Fine-tuning of Large Models
Alexander Borzunov
Dmitry Baranchuk
Tim Dettmers
Max Ryabinin
Younes Belkada
Artem Chumachenko
Pavel Samygin
Colin Raffel
VLM
116
67
0
02 Sep 2022
Instilling Type Knowledge in Language Models via Multi-Task QA
Shuyang Li
Mukund Sridhar
Chandan Prakash
Jin Cao
Wael Hamza
Julian McAuley
KELM
79
7
0
28 Apr 2022
DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
Yu Tang
Chenyu Wang
Yufan Zhang
Yuliang Liu
Xingcheng Zhang
Linbo Qiao
Zhiquan Lai
Dongsheng Li
73
6
0
30 Mar 2022
Survey on Large Scale Neural Network Training
Julia Gusak
Daria Cherniuk
Alena Shilova
A. Katrutsa
Daniel Bershatsky
...
Lionel Eyraud-Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan Oseledets
Olivier Beaumont
81
11
0
21 Feb 2022
Benchmark Assessment for DeepSpeed Optimization Library
G. Liang
I. Alsmadi
59
3
0
12 Feb 2022
Hydra: A System for Large Multi-Model Deep Learning
Kabir Nagrecha
Arun Kumar
MoE
AI4CE
73
5
0
16 Oct 2021
8-bit Optimizers via Block-wise Quantization
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
150
305
0
06 Oct 2021
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
Jiarui Fang
Zilin Zhu
Shenggui Li
Hui Su
Yang Yu
Jie Zhou
Yang You
VLM
105
25
0
12 Aug 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
289
434
0
18 Jan 2021
1