ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.02653
  4. Cited By
Checkmate: Breaking the Memory Wall with Optimal Tensor
  Rematerialization
v1v2v3 (latest)

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

7 October 2019
Paras Jain
Ajay Jain
Aniruddha Nrusimha
A. Gholami
Pieter Abbeel
Kurt Keutzer
Ion Stoica
Joseph E. Gonzalez
ArXiv (abs)PDFHTMLGithub (131★)

Papers citing "Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization"

50 / 97 papers shown
Title
SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training
SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training
Zheng Li
Yang Liu
Wei Zhang
Tailing Yuan
Bin Chen
Chengru Song
Di Zhang
46
0
0
20 Apr 2025
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
Zhanda Zhu
Christina Giannoula
Muralidhar Andoorveedu
Qidong Su
Karttikeya Mangalam
Bojian Zheng
Gennady Pekhimenko
VLMMoE
89
0
0
24 Mar 2025
FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference
Hongchao Du
Shangyu Wu
Arina Kharlamova
Nan Guan
Chun Jason Xue
91
1
0
04 Mar 2025
BladeDISC++: Memory Optimizations Based On Symbolic Shape
BladeDISC++: Memory Optimizations Based On Symbolic Shape
Xiulong Yuan
Xu Yan
Wenting Shen
Xiafei Qiu
Ang Wang
Jie Zhang
You Li
Wei Lin
111
0
0
22 Dec 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
551
4
0
20 Nov 2024
Stochastic Communication Avoidance for Recommendation Systems
Stochastic Communication Avoidance for Recommendation Systems
Lutfi Eren Erdogan
Vijay Anand Raghava Kanakagiri
Kurt Keutzer
Zhen Dong
91
1
0
03 Nov 2024
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale
  Model-in-Network Data-Parallel Training on Distributed GPUs
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
Mo Sun
Zihan Yang
Changyue Liao
Yingtao Li
Leilei Gan
Zeke Wang
103
1
0
02 Sep 2024
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster
  Scheduling
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
Xinyi Zhang
Hanyu Zhao
Wencong Xiao
Xianyan Jia
Fei Xu
Yong Li
Wei Lin
Fangming Liu
46
2
0
16 Aug 2024
Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for
  Collaborative DNN Training on Heterogeneous Edge Devices
Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices
Shengyuan Ye
Liekang Zeng
Xiaowen Chu
Guoliang Xing
Xu Chen
89
12
0
15 Aug 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
141
12
0
29 Jul 2024
ProTrain: Efficient LLM Training via Memory-Aware Techniques
ProTrain: Efficient LLM Training via Memory-Aware Techniques
Hanmei Yang
Jin Zhou
Yao Fu
Xiaoqun Wang
Ramine Roane
Hui Guan
Tongping Liu
VLM
83
1
0
12 Jun 2024
TURNIP: A "Nondeterministic" GPU Runtime with CPU RAM Offload
TURNIP: A "Nondeterministic" GPU Runtime with CPU RAM Offload
Zhimin Ding
Jiawen Yao
Brianna Barrow
Tania Lorido-Botran
Christopher M. Jermaine
Yu-Shuen Tang
Jiehui Li
Xinyu Yao
Sleem Mahmoud Abdelghafar
Daniel Bourgeois
66
2
0
25 May 2024
AI and Memory Wall
AI and Memory Wall
A. Gholami
Z. Yao
Sehoon Kim
Coleman Hooper
Michael W. Mahoney
Kurt Keutzer
84
161
0
21 Mar 2024
On the Compressibility of Quantized Large Language Models
On the Compressibility of Quantized Large Language Models
Yu Mao
Weilan Wang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
71
6
0
03 Mar 2024
SoD$^2$: Statically Optimizing Dynamic Deep Neural Network
SoD2^22: Statically Optimizing Dynamic Deep Neural Network
Wei Niu
Gagan Agrawal
Bin Ren
75
5
0
29 Feb 2024
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Xupeng Miao
Gabriele Oliaro
Xinhao Cheng
Vineeth Kada
Ruohan Gao
...
April Yang
Yingcheng Wang
Mengdi Wu
Colin Unger
Zhihao Jia
MoE
171
10
0
29 Feb 2024
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence
  Inference
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
Xuanlei Zhao
Shenggan Cheng
Guangyang Lu
Jiarui Fang
Hao Zhou
Bin Jia
Ziming Liu
Yang You
MQ
82
3
0
19 Jan 2024
Training and Serving System of Foundation Models: A Comprehensive Survey
Training and Serving System of Foundation Models: A Comprehensive Survey
Jiahang Zhou
Yanyu Chen
Zicong Hong
Wuhui Chen
Yue Yu
Tao Zhang
Hui Wang
Chuan-fu Zhang
Zibin Zheng
ALM
86
10
0
05 Jan 2024
Unicron: Economizing Self-Healing LLM Training at Scale
Unicron: Economizing Self-Healing LLM Training at Scale
Tao He
Xue Li
Zhibin Wang
Kun Qian
Jingbo Xu
Wenyuan Yu
Jingren Zhou
57
15
0
30 Dec 2023
Stateful Large Language Model Serving with Pensieve
Stateful Large Language Model Serving with Pensieve
Lingfan Yu
Jinyang Li
RALMKELMLLMAG
77
14
0
09 Dec 2023
Dissecting the Runtime Performance of the Training, Fine-tuning, and
  Inference of Large Language Models
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
Longteng Zhang
Xiang Liu
Zeyu Li
Xinglin Pan
Peijie Dong
...
Rui Guo
Xin Wang
Qiong Luo
Shaoshuai Shi
Xiaowen Chu
77
8
0
07 Nov 2023
Coop: Memory is not a Commodity
Coop: Memory is not a Commodity
Jianhao Zhang
Shihan Ma
Peihong Liu
Jinhui Yuan
51
5
0
01 Nov 2023
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
Zhikai Li
Xiaoxuan Liu
Banghua Zhu
Zhen Dong
Qingyi Gu
Kurt Keutzer
MQ
101
7
0
11 Oct 2023
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context
  LLMs Training
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training
Dacheng Li
Rulin Shao
Anze Xie
Eric P. Xing
Xuezhe Ma
Ion Stoica
Joseph E. Gonzalez
Hao Zhang
94
22
0
05 Oct 2023
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
202
2,333
0
12 Sep 2023
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models
  Fine-tuning
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning
Longteng Zhang
Lin Zhang
Shaoshuai Shi
Xiaowen Chu
Yue Liu
AI4CE
69
107
0
07 Aug 2023
TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the
  Data-Scarce Edge
TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge
Young D. Kwon
Rui Li
Stylianos I. Venieris
Jagmohan Chauhan
Nicholas D. Lane
Cecilia Mascolo
65
9
0
19 Jul 2023
Rockmate: an Efficient, Fast, Automatic and Generic Tool for
  Re-materialization in PyTorch
Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch
Xunyi Zhao
Théotime Le Hellard
Lionel Eyraud
Julia Gusak
Olivier Beaumont
71
7
0
03 Jul 2023
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using
  Training Dynamics
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics
A. Ardakani
Altan Haan
Shangyin Tan
Doru-Thom Popovici
Alvin Cheung
Costin Iancu
Koushik Sen
50
4
0
29 May 2023
Automated Tensor Model Parallelism with Overlapped Communication for
  Efficient Foundation Model Training
Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training
Shengwei Li
Zhiquan Lai
Yanqi Hao
Weijie Liu
Ke-shi Ge
Xiaoge Deng
Dongsheng Li
KaiCheng Lu
61
10
0
25 May 2023
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of
  Language Model
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
Zirui Liu
Guanchu Wang
Shaochen Zhong
Zhaozhuo Xu
Daochen Zha
...
Zhimeng Jiang
Kaixiong Zhou
Vipin Chaudhary
Shuai Xu
Helen Zhou
90
15
0
24 May 2023
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang
Fangcheng Fu
Xupeng Miao
Xiaonan Nie
Tengjiao Wang
71
11
0
17 May 2023
Moccasin: Efficient Tensor Rematerialization for Neural Networks
Moccasin: Efficient Tensor Rematerialization for Neural Networks
Burak Bartan
Haoming Li
Harris Teague
Chris Lott
B. Dilkina
50
1
0
27 Apr 2023
An Evaluation of Memory Optimization Methods for Training Neural
  Networks
An Evaluation of Memory Optimization Methods for Training Neural Networks
Xiaoxuan Liu
Siddharth Jha
Alvin Cheung
54
0
0
26 Mar 2023
RAF: Holistic Compilation for Deep Learning Model Training
RAF: Holistic Compilation for Deep Learning Model Training
Cody Hao Yu
Haozheng Fan
Guangtai Huang
Zhen Jia
Yizhi Liu
...
Yuan Zhou
Haichen Shen
Junru Shao
Mu Li
Yida Wang
72
3
0
08 Mar 2023
Slapo: A Schedule Language for Progressive Optimization of Large Deep
  Learning Model Training
Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training
Hongzheng Chen
Cody Hao Yu
Shuai Zheng
Zhen Zhang
Zhiru Zhang
Yida Wang
75
8
0
16 Feb 2023
Colossal-Auto: Unified Automation of Parallelization and Activation
  Checkpoint for Large-scale Models
Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Yuliang Liu
Shenggui Li
Jiarui Fang
Yan Shao
Boyuan Yao
Yang You
OffRL
73
7
0
06 Feb 2023
XEngine: Optimal Tensor Rematerialization for Neural Networks in
  Heterogeneous Environments
XEngine: Optimal Tensor Rematerialization for Neural Networks in Heterogeneous Environments
Manuela Schuler
Richard Membarth
P. Slusallek
71
4
0
19 Dec 2022
On-device Training: A First Overview on Existing Systems
On-device Training: A First Overview on Existing Systems
Shuai Zhu
Thiemo Voigt
Jeonggil Ko
Fatemeh Rahimian
127
17
0
01 Dec 2022
COMET: A Comprehensive Cluster Design Methodology for Distributed Deep
  Learning Training
COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training
D. Kadiyala
Saeed Rashidi
Taekyung Heo
Abhimanyu Bambhaniya
T. Krishna
Alexandros Daglis
VLM
65
7
0
30 Nov 2022
FedDCT: Federated Learning of Large Convolutional Neural Networks on
  Resource Constrained Devices using Divide and Collaborative Training
FedDCT: Federated Learning of Large Convolutional Neural Networks on Resource Constrained Devices using Divide and Collaborative Training
Quan Nguyen
Hieu H. Pham
Kok-Seng Wong
Phi Le Nguyen
Truong Thao Nguyen
Minh N. Do
FedML
97
7
0
20 Nov 2022
A Comprehensive Survey on Distributed Training of Graph Neural Networks
A Comprehensive Survey on Distributed Training of Graph Neural Networks
Haiyang Lin
Yurui Lai
Xiaochun Ye
Xiaochun Ye
Shirui Pan
Wenguang Chen
Yuan Xie
GNN
115
26
0
10 Nov 2022
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the
  Memory Usage of Neural Networks
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
56
9
0
24 Oct 2022
Tempo: Accelerating Transformer-Based Model Training through Memory
  Footprint Reduction
Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction
Muralidhar Andoorveedu
Zhanda Zhu
Bojian Zheng
Gennady Pekhimenko
47
7
0
19 Oct 2022
Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor
  Programs
Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs
Yaoyao Ding
Cody Hao Yu
Bojian Zheng
Yizhi Liu
Yida Wang
Gennady Pekhimenko
67
32
0
18 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
107
50
0
13 Oct 2022
Differentially Private Bias-Term Fine-tuning of Foundation Models
Differentially Private Bias-Term Fine-tuning of Foundation Models
Zhiqi Bu
Yu Wang
Sheng Zha
George Karypis
117
48
0
30 Sep 2022
Nesting Forward Automatic Differentiation for Memory-Efficient Deep
  Neural Network Training
Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training
Cong Guo
Yuxian Qiu
Jingwen Leng
Chen Zhang
Yingdian Cao
Quan Zhang
Yunxin Liu
Fan Yang
Minyi Guo
AI4CE
97
4
0
22 Sep 2022
Mimose: An Input-Aware Checkpointing Planner for Efficient Training on
  GPU
Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU
Jian-He Liao
Mingzhen Li
Qingxiao Sun
Jiwei Hao
F. Yu
...
Ye Tao
Zicheng Zhang
Hailong Yang
Zhongzhi Luan
D. Qian
71
4
0
06 Sep 2022
Dive into Big Model Training
Dive into Big Model Training
Qinghua Liu
Yuxiang Jiang
MoMeAI4CELRM
33
3
0
25 Jul 2022
12
Next