Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.05198
Cited By
Reducing Activation Recomputation in Large Transformer Models
10 May 2022
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
M. Shoeybi
Bryan Catanzaro
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reducing Activation Recomputation in Large Transformer Models"
50 / 164 papers shown
Title
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Daya Guo
Qihao Zhu
Dejian Yang
Zhenda Xie
Kai Dong
...
Yu-Huan Wu
Y. K. Li
Fuli Luo
Yingfei Xiong
W. Liang
ELM
48
651
0
25 Jan 2024
PartIR: Composing SPMD Partitioning Strategies for Machine Learning
Sami Alabed
Daniel Belov
Bart Chrzaszcz
Juliana Franco
Dominik Grewe
...
Michael Schaarschmidt
Timur Sitdikov
Agnieszka Swietlik
Dimitrios Vytiniotis
Joel Wee
26
3
0
20 Jan 2024
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding
Qiaoling Chen
Diandian Gu
Guoteng Wang
Xun Chen
Yingtong Xiong
...
Qi Hu
Xin Jin
Yonggang Wen
Tianwei Zhang
Peng Sun
39
8
0
17 Jan 2024
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Damai Dai
Chengqi Deng
Chenggang Zhao
R. X. Xu
Huazuo Gao
...
Panpan Huang
Fuli Luo
Chong Ruan
Zhifang Sui
W. Liang
MoE
34
243
0
11 Jan 2024
TeleChat Technical Report
Zhongjiang He
Zihan Wang
Xinzhan Liu
Shixuan Liu
Yitong Yao
...
Zilu Huang
Sishi Xiong
Yuxiang Zhang
Chao Wang
Shuangyong Song
AI4MH
LRM
ALM
56
3
0
08 Jan 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
139
304
0
05 Jan 2024
Re-evaluating the Memory-balanced Pipeline Parallelism: BPipe
Mincong Huang
Chao Wang
Chi Ma
Yineng Zhang
Peng Zhang
Lei Yu
20
1
0
04 Jan 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xintao Hu
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
14
64
0
04 Jan 2024
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Nikhil Sardana
Jacob P. Portes
Sasha Doubov
Jonathan Frankle
LRM
224
65
0
31 Dec 2023
Unicron: Economizing Self-Healing LLM Training at Scale
Tao He
Xue Li
Zhibin Wang
Kun Qian
Jingbo Xu
Wenyuan Yu
Jingren Zhou
14
14
0
30 Dec 2023
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Jacob P. Portes
Alex Trott
Sam Havens
Daniel King
Abhinav Venigalla
Moin Nadeem
Nikhil Sardana
D. Khudia
Jonathan Frankle
13
16
0
29 Dec 2023
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen
Jiahao Zhang
Yixiao Du
Shaojie Xiang
Zichao Yue
Niansong Zhang
Yaohui Cai
Zhiru Zhang
43
34
0
23 Dec 2023
Towards Message Brokers for Generative AI: Survey, Challenges, and Opportunities
Alaa Saleh
Roberto Morabito
Sasu Tarkoma
Susanna Pirttikangas
Lauri Lovén
58
3
0
22 Dec 2023
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Kaiqiang Song
Xiaoyang Wang
Sangwoo Cho
Xiaoman Pan
Dong Yu
26
7
0
14 Dec 2023
Stateful Large Language Model Serving with Pensieve
Lingfan Yu
Jinyang Li
RALM
KELM
LLMAG
29
12
0
09 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
21
31
0
08 Dec 2023
ASPEN: High-Throughput LoRA Fine-Tuning of Large Language Models with a Single GPU
Zhengmao Ye
Dengchun Li
Jingqi Tian
Tingfeng Lan
Jie Zuo
...
Hui Lu
Yexi Jiang
Jian Sha
Ke Zhang
Mingjie Tang
91
7
0
05 Dec 2023
From Beginner to Expert: Modeling Medical Knowledge into General LLMs
Qiang Li
Xiaoyan Yang
Haowen Wang
Qin Wang
Lei Liu
...
Wangshu Zhang
Teng Xu
Jinjie Gu
Jing Zheng
Guannan Zhang
LM&MA
ELM
AI4MH
19
14
0
02 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
27
21
0
01 Dec 2023
Zero Bubble Pipeline Parallelism
Penghui Qi
Xinyi Wan
Guangxing Huang
Min Lin
16
23
0
30 Nov 2023
Zero-shot Conversational Summarization Evaluations with small Large Language Models
R. Manuvinakurike
Saurav Sahay
Sangeeta Manepalli
L. Nachman
ELM
LM&MA
22
0
0
29 Nov 2023
Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search
Zhiqi Lin
Youshan Miao
Guanbin Xu
Cheng Li
Olli Saarikivi
Saeed Maleki
Fan Yang
12
6
0
26 Nov 2023
Striped Attention: Faster Ring Attention for Causal Transformers
William Brandon
Aniruddha Nrusimha
Kevin Qian
Zack Ankner
Tian Jin
Zhiye Song
Jonathan Ragan-Kelley
16
36
0
15 Nov 2023
Efficient Parallelization Layouts for Large-Scale Distributed Model Training
Johannes Hagemann
Samuel Weinbach
Konstantin Dobler
Maximilian Schall
Gerard de Melo
LRM
29
6
0
09 Nov 2023
Just-in-time Quantization with Processing-In-Memory for Efficient ML Training
M. Ibrahim
Shaizeen Aga
Ada Li
Suchita Pati
Mahzabeen Islam
21
3
0
08 Nov 2023
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
Longteng Zhang
Xiang Liu
Zeyu Li
Xinglin Pan
Peijie Dong
...
Rui Guo
Xin Wang
Qiong Luo
S. Shi
Xiaowen Chu
36
7
0
07 Nov 2023
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Ying Sheng
Shiyi Cao
Dacheng Li
Coleman Hooper
Nicholas Lee
...
Banghua Zhu
Lianmin Zheng
Kurt Keutzer
Joseph E. Gonzalez
Ion Stoica
MoE
26
87
0
06 Nov 2023
Ultra-Long Sequence Distributed Transformer
Xiao Wang
Isaac Lyngaas
A. Tsaris
Peng Chen
Sajal Dash
Mayanka Chandra Shekar
Tao Luo
Hong-Jun Yoon
M. Wahib
John P. Gounley
27
4
0
04 Nov 2023
Coop: Memory is not a Commodity
Jianhao Zhang
Shihan Ma
Peihong Liu
Jinhui Yuan
27
4
0
01 Nov 2023
Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision
Jiaxin Zhang
Zhuohang Li
Kamalika Das
Kumar Sricharan
23
2
0
31 Oct 2023
Skywork: A More Open Bilingual Foundation Model
Tianwen Wei
Liang Zhao
Lichang Zhang
Bo Zhu
Lijie Wang
...
Yongyi Peng
Xiaojuan Liang
Shuicheng Yan
Han Fang
Yahui Zhou
27
92
0
30 Oct 2023
TRANSOM: An Efficient Fault-Tolerant System for Training LLMs
Baodong Wu
Lei Xia
Qingping Li
Kangyu Li
Xu Chen
Yongqiang Guo
Tieyao Xiang
Yuheng Chen
Shigang Li
22
11
0
16 Oct 2023
BC4LLM: Trusted Artificial Intelligence When Blockchain Meets Large Language Models
Haoxiang Luo
Jian Luo
Athanasios V. Vasilakos
26
9
0
10 Oct 2023
Rethinking Memory and Communication Cost for Efficient Large Language Model Training
Chan Wu
Hanxiao Zhang
Lin Ju
Jinjing Huang
Youshao Xiao
...
Siyuan Li
Fanzhuang Meng
Lei Liang
Xiaolu Zhang
Jun Zhou
13
4
0
09 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio tokens
Robin Algayres
Yossi Adi
Tu Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
AuLLM
38
12
0
08 Oct 2023
Scaling Laws of RoPE-based Extrapolation
Xiaoran Liu
Hang Yan
Shuo Zhang
Chen An
Xipeng Qiu
Dahua Lin
23
83
0
08 Oct 2023
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training
Dacheng Li
Rulin Shao
Anze Xie
Eric P. Xing
Xuezhe Ma
Ion Stoica
Joseph E. Gonzalez
Hao Zhang
32
17
0
05 Oct 2023
Ring Attention with Blockwise Transformers for Near-Infinite Context
Hao Liu
Matei A. Zaharia
Pieter Abbeel
36
216
0
03 Oct 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
S. A. Jacobs
Masahiro Tanaka
Chengming Zhang
Minjia Zhang
L. Song
Samyam Rajbhandari
Yuxiong He
25
100
0
25 Sep 2023
FLM-101B: An Open LLM and How to Train It with
100
K
B
u
d
g
e
t
100K Budget
100
K
B
u
d
g
e
t
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
LI DU
Bowen Qin
Zheng-Wei Zhang
Aixin Sun
Yequan Wang
55
21
0
07 Sep 2023
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models
Kaiyuan Gao
Su He
Zhenyu He
Jiacheng Lin
Qizhi Pei
Jie Shao
Wei Zhang
LM&MA
SyDa
30
4
0
27 Aug 2023
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning
Longteng Zhang
Lin Zhang
S. Shi
X. Chu
Bo-wen Li
AI4CE
13
91
0
07 Aug 2023
PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems
Runzhou Han
Mai Zheng
S. Byna
Houjun Tang
Bin Dong
...
Yong Chen
Dongkyun Kim
Joseph Hassoun
D. Thorsley
Matthew Wolf
18
2
0
02 Aug 2023
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Saeed Mian
OffRL
46
514
0
12 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
35
151
0
05 Jul 2023
FedJETs: Efficient Just-In-Time Personalization with Federated Mixture of Experts
Chen Dun
Mirian Hipolito Garcia
Guoqing Zheng
Ahmed Hassan Awadallah
Robert Sim
Anastasios Kyrillidis
Dimitrios Dimitriadis
FedML
MoE
19
6
0
14 Jun 2023
Blockwise Parallel Transformer for Large Context Models
Hao Liu
Pieter Abbeel
39
11
0
30 May 2023
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism
Xin Chen
Hengheng Zhang
Xiaotao Gu
Kaifeng Bi
Lingxi Xie
Qi Tian
MoE
14
4
0
22 Apr 2023
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Yanli Zhao
Andrew Gu
R. Varma
Liangchen Luo
Chien-chin Huang
...
Bernard Nguyen
Geeta Chauhan
Y. Hao
Ajit Mathews
Shen Li
FedML
MoE
27
303
0
21 Apr 2023
BloombergGPT: A Large Language Model for Finance
Shijie Wu
Ozan Irsoy
Steven Lu
Vadim Dabravolski
Mark Dredze
Sebastian Gehrmann
P. Kambadur
David S. Rosenberg
Gideon Mann
AIFin
51
780
0
30 Mar 2023
Previous
1
2
3
4
Next