ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.05198
  4. Cited By
Reducing Activation Recomputation in Large Transformer Models

Reducing Activation Recomputation in Large Transformer Models

10 May 2022
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
M. Shoeybi
Bryan Catanzaro
    AI4CE
ArXivPDFHTML

Papers citing "Reducing Activation Recomputation in Large Transformer Models"

50 / 164 papers shown
Title
Demystifying the Communication Characteristics for Distributed
  Transformer Models
Demystifying the Communication Characteristics for Distributed Transformer Models
Quentin G. Anthony
Benjamin Michalowicz
Jacob Hatef
Lang Xu
Mustafa Abduljabbar
A. Shafi
Hari Subramoni
D. Panda
AI4CE
28
2
0
19 Aug 2024
Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble
  Exploitation
Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation
Weiqi Feng
Yangrui Chen
Shaoyu Wang
Yanghua Peng
Haibin Lin
Minlan Yu
MLLM
AI4CE
26
4
0
07 Aug 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
69
8
0
29 Jul 2024
ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development
ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development
Borui Wan
Mingji Han
Yiyao Sheng
Zhichao Lai
Mofan Zhang
...
Haibin Lin
Xin Liu
Chuan Wu
Xin Liu
Chuan Wu
26
3
0
29 Jul 2024
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long
  Sequences Training
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Cheng Luo
Jiawei Zhao
Zhuoming Chen
Beidi Chen
A. Anandkumar
21
3
0
22 Jul 2024
Performance Modeling and Workload Analysis of Distributed Large Language
  Model Training and Inference
Performance Modeling and Workload Analysis of Distributed Large Language Model Training and Inference
Joyjit Kundu
Wenzhe Guo
Ali BanaGozar
Udari De Alwis
Sourav Sengupta
Puneet Gupta
Arindam Mallik
32
3
0
19 Jul 2024
TorchGT: A Holistic System for Large-scale Graph Transformer Training
TorchGT: A Holistic System for Large-scale Graph Transformer Training
Mengdie Zhang
Jie Sun
Qi Hu
Peng Sun
Zeke Wang
Yonggang Wen
Tianwei Zhang
GNN
34
2
0
19 Jul 2024
LLMBox: A Comprehensive Library for Large Language Models
LLMBox: A Comprehensive Library for Large Language Models
Tianyi Tang
Yiwen Hu
Bingqian Li
Wenyang Luo
Zijing Qin
...
Chunxuan Xia
Junyi Li
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
26
1
0
08 Jul 2024
$\text{Memory}^3$: Language Modeling with Explicit Memory
Memory3\text{Memory}^3Memory3: Language Modeling with Explicit Memory
Hongkang Yang
Zehao Lin
Wenjin Wang
Hao Wu
Zhiyu Li
...
Yu Yu
Kai Chen
Feiyu Xiong
Linpeng Tang
Weinan E
48
11
0
01 Jul 2024
WallFacer: Guiding Transformer Model Training Out of the Long-Context
  Dark Forest with N-body Problem
WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem
Ziming Liu
Shaoyu Wang
Shenggan Cheng
Zhongkai Zhao
Xuanlei Zhao
James Demmel
Yang You
30
0
0
30 Jun 2024
Universal Checkpointing: Efficient and Flexible Checkpointing for Large
  Scale Distributed Training
Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training
Xinyu Lian
Sam Ade Jacobs
Lev Kurilenko
Masahiro Tanaka
Stas Bekman
Olatunji Ruwase
Minjia Zhang
OffRL
18
8
0
27 Jun 2024
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context
  Parallelism
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism
Diandian Gu
Peng Sun
Qinghao Hu
Ting Huang
Xun Chen
...
Jiarui Fang
Yonggang Wen
Tianwei Zhang
Xin Jin
Xuanzhe Liu
LRM
35
7
0
26 Jun 2024
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing
  Backpropagation
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
Yuchen Yang
Yingdong Shi
Cheems Wang
Xiantong Zhen
Yuxuan Shi
Jun Xu
32
1
0
24 Jun 2024
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Ruoyu Qin
Zheming Li
Weiran He
Mingxing Zhang
Yongwei Wu
Weimin Zheng
Xinran Xu
40
54
0
24 Jun 2024
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory
  Utilization for Hybrid CPU-GPU Offloaded Optimizers
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
Avinash Maurya
Jie Ye
M. Rafique
Franck Cappello
Bogdan Nicolae
23
1
0
15 Jun 2024
Optimizing Large Model Training through Overlapped Activation Recomputation
Optimizing Large Model Training through Overlapped Activation Recomputation
Ping Chen
Wenjie Zhang
Shuibing He
Yingjie Gu
Zhuwei Peng
...
Yi Zheng
Zhefeng Wang
Yanlong Yin
Gang Chen
Gang Chen
33
5
0
13 Jun 2024
ProTrain: Efficient LLM Training via Memory-Aware Techniques
ProTrain: Efficient LLM Training via Memory-Aware Techniques
Hanmei Yang
Jin Zhou
Yao Fu
Xiaoqun Wang
Ramine Roane
Hui Guan
Tongping Liu
VLM
28
0
0
12 Jun 2024
An Empirical Study of Mamba-based Language Models
An Empirical Study of Mamba-based Language Models
R. Waleffe
Wonmin Byeon
Duncan Riach
Brandon Norick
V. Korthikanti
...
Vartika Singh
Jared Casper
Jan Kautz
M. Shoeybi
Bryan Catanzaro
54
64
0
12 Jun 2024
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel
  Fusion
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
Li-Wen Chang
Wenlei Bao
Qi Hou
Chengquan Jiang
Ningxin Zheng
...
Zuquan Song
Ziheng Jiang
Haibin Lin
Xin Jin
Xin Liu
36
19
0
11 Jun 2024
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large
  Language Model Training
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
Ao Sun
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
29
7
0
05 Jun 2024
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal
  Learning
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Alex Jinpeng Wang
Linjie Li
Yiqi Lin
Min Li
Lijuan Wang
Mike Zheng Shou
VLM
18
3
0
04 Jun 2024
Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ
  Transformer Inference
Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference
Shengyuan Ye
Jiangsu Du
Liekang Zeng
Wenzhong Ou
Xiaowen Chu
Yutong Lu
Xu Chen
28
17
0
27 May 2024
Scaling Laws for Discriminative Classification in Large Language Models
Scaling Laws for Discriminative Classification in Large Language Models
Dean Wyatte
Fatemeh Tahmasbi
Ming Li
Thomas Markovich
35
2
0
24 May 2024
Pipeline Parallelism with Controllable Memory
Pipeline Parallelism with Controllable Memory
Penghui Qi
Xinyi Wan
Nyamdavaa Amar
Min Lin
17
6
0
24 May 2024
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs
  Amid Failures
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures
Swapnil Gandhi
Mark Zhao
Athinagoras Skiadopoulos
Christos Kozyrakis
AI4CE
GNN
36
1
0
22 May 2024
360Zhinao Technical Report
360Zhinao Technical Report
360Zhinao Team
32
0
0
22 May 2024
Rethinking Overlooked Aspects in Vision-Language Models
Rethinking Overlooked Aspects in Vision-Language Models
Yuan Liu
Le Tian
Xiao Zhou
Jie Zhou
VLM
30
2
0
20 May 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large
  Language Model Serving
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
Pai Zeng
Zhenyu Ning
Jieru Zhao
Weihao Cui
Mengwei Xu
Liwei Guo
Xusheng Chen
Yizhou Shan
LLMAG
40
4
0
18 May 2024
USP: A Unified Sequence Parallelism Approach for Long Context Generative
  AI
USP: A Unified Sequence Parallelism Approach for Long Context Generative AI
Jiarui Fang
Shangchun Zhao
24
15
0
13 May 2024
Granite Code Models: A Family of Open Foundation Models for Code
  Intelligence
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Mayank Mishra
Matt Stallone
Gaoyuan Zhang
Yikang Shen
Aditya Prasad
...
Amith Singhee
Nirmit Desai
David D. Cox
Ruchir Puri
Rameswar Panda
AI4TS
46
54
0
07 May 2024
Collage: Light-Weight Low-Precision Strategy for LLM Training
Collage: Light-Weight Low-Precision Strategy for LLM Training
Tao Yu
Gaurav Gupta
Karthick Gopalswamy
Amith R. Mamidala
Hao Zhou
Jeffrey Huynh
Youngsuk Park
Ron Diamant
Anoop Deoras
Jun Huan
MQ
49
3
0
06 May 2024
Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing
  Japanese Language Capabilities
Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities
Kazuki Fujii
Taishi Nakamura
Mengsay Loem
Hiroki Iida
Masanari Ohi
Kakeru Hattori
Hirai Shota
Sakae Mizuki
Rio Yokota
Naoaki Okazaki
CLL
35
53
0
27 Apr 2024
Tele-FLM Technical Report
Tele-FLM Technical Report
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Chao Wang
...
Yequan Wang
Zhongjiang He
Zhongyuan Wang
Xuelong Li
Tiejun Huang
30
3
0
25 Apr 2024
PARAMANU-GANITA: Can Small Math Language Models Rival with Large Language Models on Mathematical Reasoning?
PARAMANU-GANITA: Can Small Math Language Models Rival with Large Language Models on Mathematical Reasoning?
Mitodru Niyogi
Arnab Bhattacharya
LRM
ReLM
36
0
0
22 Apr 2024
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Haozheng Fan
Hao Zhou
Guangtai Huang
Parameswaran Raman
Xinwei Fu
Gaurav Gupta
Dhananjay Ram
Yida Wang
Jun Huan
36
5
0
16 Apr 2024
LoongServe: Efficiently Serving Long-context Large Language Models with
  Elastic Sequence Parallelism
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism
Bingya Wu
Shengyu Liu
Yinmin Zhong
Peng Sun
Xuanzhe Liu
Xin Jin
RALM
35
51
0
15 Apr 2024
Dense Training, Sparse Inference: Rethinking Training of
  Mixture-of-Experts Language Models
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Bowen Pan
Yikang Shen
Haokun Liu
Mayank Mishra
Gaoyuan Zhang
Aude Oliva
Colin Raffel
Rameswar Panda
MoE
33
19
0
08 Apr 2024
Linear Attention Sequence Parallelism
Linear Attention Sequence Parallelism
Weigao Sun
Zhen Qin
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
68
2
0
03 Apr 2024
AI and Memory Wall
AI and Memory Wall
A. Gholami
Z. Yao
Sehoon Kim
Coleman Hooper
Michael W. Mahoney
Kurt Keutzer
17
141
0
21 Mar 2024
BurstAttention: An Efficient Distributed Attention Framework for
  Extremely Long Sequences
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Sun Ao
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
GNN
32
8
0
14 Mar 2024
Scattered Mixture-of-Experts Implementation
Scattered Mixture-of-Experts Implementation
Shawn Tan
Yikang Shen
Rameswar Panda
Aaron Courville
MoE
22
8
0
13 Mar 2024
Characterization of Large Language Model Development in the Datacenter
Characterization of Large Language Model Development in the Datacenter
Qi Hu
Zhisheng Ye
Zerui Wang
Guoteng Wang
Mengdie Zhang
...
Dahua Lin
Xiaolin Wang
Yingwei Luo
Yonggang Wen
Tianwei Zhang
48
43
0
12 Mar 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
36
286
0
08 Mar 2024
Model Parallelism on Distributed Infrastructure: A Literature Review
  from Theory to LLM Case-Studies
Model Parallelism on Distributed Infrastructure: A Literature Review from Theory to LLM Case-Studies
Felix Brakel
Uraz Odyurt
A. Varbanescu
GNN
31
11
0
06 Mar 2024
On the Compressibility of Quantized Large Language Models
On the Compressibility of Quantized Large Language Models
Yu Mao
Weilan Wang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
23
6
0
03 Mar 2024
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Sunghyeon Woo
Baeseong Park
Byeongwook Kim
Minjung Jo
S. Kwon
Dongsuk Jeon
Dongsoo Lee
57
2
0
27 Feb 2024
MegaScale: Scaling Large Language Model Training to More Than 10,000
  GPUs
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Ziheng Jiang
Haibin Lin
Yinmin Zhong
Qi Huang
Yangrui Chen
...
Zhe Li
X. Jia
Jia-jun Ye
Xin Jin
Xin Liu
LRM
38
100
0
23 Feb 2024
World Model on Million-Length Video And Language With Blockwise RingAttention
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu
Wilson Yan
Matei A. Zaharia
Pieter Abbeel
VGen
29
57
0
13 Feb 2024
ZeroPP: Unleashing Exceptional Parallelism Efficiency through
  Tensor-Parallelism-Free Methodology
ZeroPP: Unleashing Exceptional Parallelism Efficiency through Tensor-Parallelism-Free Methodology
Ding Tang
Lijuan Jiang
Jiecheng Zhou
Minxi Jin
Hengjie Li
Xingcheng Zhang
Zhiling Pei
Jidong Zhai
62
3
0
06 Feb 2024
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit
  Large Language Models
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models
Xuchen Pan
Yanxi Chen
Yaliang Li
Bolin Ding
Jingren Zhou
15
8
0
01 Feb 2024
Previous
1234
Next