ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05645
  4. Cited By
Training Large Neural Networks with Constant Memory using a New
  Execution Algorithm
v1v2v3v4v5 (latest)

Training Large Neural Networks with Constant Memory using a New Execution Algorithm

13 February 2020
B. Pudipeddi
Maral Mesmakhosroshahi
Jinwen Xi
S. Bharadwaj
ArXiv (abs)PDFHTML

Papers citing "Training Large Neural Networks with Constant Memory using a New Execution Algorithm"

34 / 34 papers shown
10Cache: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training
10Cache: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training
Sabiha Afroz
Redwan Ibne Seraj Khan
Hadeel Albahar
Jingoo Han
A. R. Butt
208
1
0
18 Nov 2025
Symbiosis: Multi-Adapter Inference and Fine-Tuning
Symbiosis: Multi-Adapter Inference and Fine-Tuning
Saransh Gupta
Umesh Deshpande
Travis Janssen
Swami Sundararaman
MoE
417
0
0
03 Jul 2025
SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs
SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs
Jinwoo Park
Seunggeun Cho
Dongsu Han
418
5
0
16 May 2025
GPIoT: Tailoring Small Language Models for IoT Program Synthesis and Development
GPIoT: Tailoring Small Language Models for IoT Program Synthesis and DevelopmentACM International Conference on Embedded Networked Sensor Systems (SenSys), 2025
Leming Shen
Qiang Yang
Xinyu Huang
Zijing Ma
Yuanqing Zheng
307
19
0
02 Mar 2025
GPU Memory Usage Optimization for Backward Propagation in Deep Network Training
GPU Memory Usage Optimization for Backward Propagation in Deep Network Training
Ding-Yong Hong
Tzu-Hsien Tsai
Ning Wang
Pangfeng Liu
Jan-Jan Wu
267
2
0
18 Feb 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
420
0
0
10 Jan 2025
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Yang Liu
396
48
0
29 Jul 2024
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM
  Inference on Consumer Devices
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices
Ruslan Svirschevski
Avner May
Zhuoming Chen
Beidi Chen
Zhihao Jia
Max Ryabinin
428
51
0
04 Jun 2024
Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics
Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics
Zhengde Zhang
Yiyu Zhang
Haodong Yao
Jianwen Luo
Rui Zhao
...
Ke Li
Lina Zhao
Jun Cao
Fazhi Qi
Changzheng Yuan
142
5
0
08 Apr 2024
Smart-Infinity: Fast Large Language Model Training using Near-Storage
  Processing on a Real System
Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real SystemInternational Symposium on High-Performance Computer Architecture (HPCA), 2024
Hongsun Jang
Jaeyong Song
Jaewon Jung
Jaeyoung Park
Youngsok Kim
Jinho Lee
192
33
0
11 Mar 2024
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a
  Single GPU
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
Changyue Liao
Mo Sun
Zihan Yang
Kaiqi Chen
Binhang Yuan
Leilei Gan
Zeke Wang
191
2
0
11 Mar 2024
HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy
HiFT: A Hierarchical Full Parameter Fine-Tuning StrategyConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yongkang Liu
Yiqun Zhang
Qian Li
Tong Liu
Shi Feng
Daling Wang
Yifei Zhang
Hinrich Schütze
361
18
0
26 Jan 2024
LR-CNN: Lightweight Row-centric Convolutional Neural Network Training
  for Memory Reduction
LR-CNN: Lightweight Row-centric Convolutional Neural Network Training for Memory Reduction
Zhigang Wang
Hangyu Yang
Ning Wang
Chuanfei Xu
Jie Nie
Zhiqiang Wei
Yu Gu
Ge Yu
250
0
0
21 Jan 2024
Fast Inference of Mixture-of-Experts Language Models with Offloading
Fast Inference of Mixture-of-Experts Language Models with Offloading
Artyom Eliseev
Denis Mazur
MoE
435
81
0
28 Dec 2023
Hazards from Increasingly Accessible Fine-Tuning of Downloadable
  Foundation Models
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models
Alan Chan
Ben Bucknall
Herbie Bradley
David M. Krueger
285
8
0
22 Dec 2023
Distributed Inference and Fine-tuning of Large Language Models Over The
  Internet
Distributed Inference and Fine-tuning of Large Language Models Over The InternetNeural Information Processing Systems (NeurIPS), 2023
Alexander Borzunov
Max Ryabinin
Artem Chumachenko
Dmitry Baranchuk
Tim Dettmers
Younes Belkada
Pavel Samygin
Colin Raffel
MoEALM
251
80
0
13 Dec 2023
Full Parameter Fine-tuning for Large Language Models with Limited
  Resources
Full Parameter Fine-tuning for Large Language Models with Limited ResourcesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Kai Lv
Yuqing Yang
Tengxiao Liu
Qi-jie Gao
Qipeng Guo
Xipeng Qiu
414
205
0
16 Jun 2023
Adam Accumulation to Reduce Memory Footprints of both Activations and
  Gradients for Large-scale DNN Training
Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN TrainingEuropean Conference on Artificial Intelligence (ECAI), 2023
Yijia Zhang
Yibo Han
Shijie Cao
Guohao Dai
Youshan Miao
Ting Cao
Fan Yang
Ningyi Xu
137
8
0
31 May 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly
  Communication-Efficient
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-EfficientInternational Conference on Machine Learning (ICML), 2023
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
426
60
0
27 Jan 2023
Systems for Parallel and Distributed Large-Model Deep Learning Training
Systems for Parallel and Distributed Large-Model Deep Learning Training
Kabir Nagrecha
GNNVLMMoE
207
9
0
06 Jan 2023
Elixir: Train a Large Language Model on a Small GPU Cluster
Elixir: Train a Large Language Model on a Small GPU Cluster
Haichen Huang
Jiarui Fang
Hongxin Liu
Shenggui Li
Yang You
VLM
317
10
0
10 Dec 2022
Petals: Collaborative Inference and Fine-tuning of Large Models
Petals: Collaborative Inference and Fine-tuning of Large ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Alexander Borzunov
Dmitry Baranchuk
Tim Dettmers
Max Ryabinin
Younes Belkada
Artem Chumachenko
Pavel Samygin
Colin Raffel
VLM
277
110
0
02 Sep 2022
Instilling Type Knowledge in Language Models via Multi-Task QA
Instilling Type Knowledge in Language Models via Multi-Task QA
Shuyang Li
Mukund Sridhar
Chandan Prakash
Jin Cao
Wael Hamza
Julian McAuley
KELM
241
7
0
28 Apr 2022
DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
Yu Tang
Chenyu Wang
Yufan Zhang
Yuliang Liu
Xingcheng Zhang
Linbo Qiao
Zhiquan Lai
Dongsheng Li
293
6
0
30 Mar 2022
Survey on Large Scale Neural Network Training
Survey on Large Scale Neural Network Training
Julia Gusak
Daria Cherniuk
Alena Shilova
A. Katrutsa
Daniel Bershatsky
...
Lionel Eyraud-Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan Oseledets
Olivier Beaumont
271
15
0
21 Feb 2022
Benchmark Assessment for DeepSpeed Optimization Library
Benchmark Assessment for DeepSpeed Optimization Library
G. Liang
I. Alsmadi
228
3
0
12 Feb 2022
Hydra: A System for Large Multi-Model Deep Learning
Hydra: A System for Large Multi-Model Deep Learning
Kabir Nagrecha
Arun Kumar
MoEAI4CE
405
5
0
16 Oct 2021
8-bit Optimizers via Block-wise Quantization
8-bit Optimizers via Block-wise Quantization
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
564
440
0
06 Oct 2021
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based
  Memory Management
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory ManagementIEEE Transactions on Parallel and Distributed Systems (TPDS), 2021
Jiarui Fang
Zilin Zhu
Shenggui Li
Hui Su
Yang Yu
Jie Zhou
Yang You
VLM
335
43
0
12 Aug 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model TrainingUSENIX Annual Technical Conference (USENIX ATC), 2021
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
562
551
0
18 Jan 2021
HetSeq: Distributed GPU Training on Heterogeneous Infrastructure
HetSeq: Distributed GPU Training on Heterogeneous InfrastructureAAAI Conference on Artificial Intelligence (AAAI), 2020
Yifan Ding
Nicholas Botzer
Tim Weninger
VLMMoE
106
15
0
25 Sep 2020
Current Limitations of Language Models: What You Need is Retrieval
Current Limitations of Language Models: What You Need is Retrieval
Aran Komatsuzaki
LRM
186
3
0
15 Sep 2020
Matching Guided Distillation
Matching Guided Distillation
Kaiyu Yue
Jiangfan Deng
Feng Zhou
297
62
0
23 Aug 2020
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
ZeRO: Memory Optimizations Toward Training Trillion Parameter ModelsInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2019
Samyam Rajbhandari
Jeff Rasley
Olatunji Ruwase
Yuxiong He
ALMAI4CE
621
1,580
0
04 Oct 2019
1
Page 1 of 1