Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2410.19313
Cited By
v1
v2
v3 (latest)
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
International Conference on Learning Representations (ICLR), 2024
25 October 2024
Haocheng Xi
Han Cai
Ligeng Zhu
Yaojie Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (19 upvotes)
Papers citing
"COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training"
50 / 67 papers shown
Title
MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
Yu Zhang
Hui-Ling Zhen
Mingxuan Yuan
Bei Yu
MQ
4
0
0
08 Nov 2025
InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models
Wenjun Wang
Shuo Cai
C. Xie
Mingfa Feng
Y. Zhang
Zhen Li
Kejing Yang
Ming Li
Jiannong Cao
Yuan Xie
MQ
56
0
0
26 Sep 2025
QKV Projections Require a Fraction of Their Memory
Malik Khalf
Yara Shamshoum
Nitzan Hodos
Yuval Sieradzki
Assaf Schuster
MQ
VLM
172
0
0
03 Jun 2025
Oscillation-Reduced MXFP4 Training for Vision Transformers
Yuxiang Chen
Haocheng Xi
Jun Zhu
Jianfei Chen
MQ
215
9
0
28 Feb 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu Zhang
Gaojie Jin
Xianrui Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
261
5
0
24 Feb 2025
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
Rishabh Tiwari
Haocheng Xi
Aditya Tomar
Coleman Hooper
Sehoon Kim
Maxwell Horton
Mahyar Najibi
Michael W. Mahoney
Kemal Kurniawan
Amir Gholami
MQ
149
8
0
05 Feb 2025
SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization
Jintao Zhang
Haofeng Huang
Pengle Zhang
Jia Wei
Jun-Jie Zhu
Jianfei Chen
MQ
VLM
383
46
0
17 Nov 2024
CompAct: Compressed Activations for Memory-Efficient LLM Training
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yara Shamshoum
Nitzan Hodos
Yuval Sieradzki
Assaf Schuster
MQ
VLM
190
5
0
20 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
International Conference on Learning Representations (ICLR), 2024
Jintao Zhang
Jia Wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
380
66
0
03 Oct 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
374
262
0
11 Jul 2024
Nemotron-4 340B Technical Report
Nvidia
:
Bo Adler
Niket Agarwal
Ashwath Aithal
...
Jimmy Zhang
Jing Zhang
Vivienne Zhang
Yian Zhang
Chen Zhu
217
99
0
17 Jun 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
...
Xiawu Zheng
Enhong Chen
Caifeng Shan
Xing Sun
Xing Sun
VLM
MLLM
347
686
0
31 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
135
20
0
23 May 2024
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
Haocheng Xi
Yuxiang Chen
Kang Zhao
Kaijun Zheng
Jianfei Chen
Jun Zhu
MQ
174
27
0
19 Mar 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu Zhang
Beidi Chen
Zhangyang Wang
A. Anandkumar
Yuandong Tian
269
312
0
06 Mar 2024
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
Zhiyang Xu
Chao Feng
Rulin Shao
Trevor Ashby
Ying Shen
dingnan jin
Yu Cheng
Qifan Wang
Lifu Huang
MLLM
VLM
138
58
0
18 Feb 2024
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
373
518
0
01 Feb 2024
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Luca Soldaini
Rodney Michael Kinney
Akshita Bhagia
Dustin Schwenk
David Atkinson
...
Hanna Hajishirzi
Iz Beltagy
Dirk Groeneveld
Jesse Dodge
Kyle Lo
275
359
0
31 Jan 2024
VILA: On Pre-training for Visual Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Ji Lin
Hongxu Yin
Ming-Yu Liu
Yao Lu
Pavlo Molchanov
Andrew Tao
Huizi Mao
Jan Kautz
Mohammad Shoeybi
Song Han
MLLM
VLM
451
605
0
12 Dec 2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Computer Vision and Pattern Recognition (CVPR), 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
629
1,406
0
27 Nov 2023
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
European Conference on Computer Vision (ECCV), 2023
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Yuan Liu
Feng Zhao
Dahua Lin
MLLM
VLM
283
882
0
21 Nov 2023
FP8-LM: Training FP8 Large Language Models
Houwen Peng
Kan Wu
Yixuan Wei
Guoshuai Zhao
Yuxiang Yang
...
Zheng Zhang
Shuguang Liu
Joe Chau
Han Hu
Jun Zhou
MQ
212
61
0
27 Oct 2023
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
International Conference on Learning Representations (ICLR), 2023
Xiang Yue
Xingwei Qu
Ge Zhang
Yao Fu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
AIMat
LRM
292
488
0
11 Sep 2023
Memory Efficient Optimizers with 4-bit States
Neural Information Processing Systems (NeurIPS), 2023
Bingrui Li
Jianfei Chen
Jun Zhu
MQ
204
53
0
04 Sep 2023
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
Bohao Li
Rui Wang
Guangzhi Wang
Yuying Ge
Yixiao Ge
Ying Shan
MLLM
ELM
299
736
0
30 Jul 2023
Training Transformers with 4-bit Integers
Neural Information Processing Systems (NeurIPS), 2023
Haocheng Xi
Changhao Li
Jianfei Chen
Jun Zhu
MQ
222
67
0
21 Jun 2023
Evaluating Object Hallucination in Large Vision-Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
525
1,121
0
17 May 2023
Stable and low-precision training for large-scale vision-language models
Neural Information Processing Systems (NeurIPS), 2023
Mitchell Wortsman
Tim Dettmers
Luke Zettlemoyer
Ari S. Morcos
Ali Farhadi
Ludwig Schmidt
MQ
MLLM
VLM
209
64
0
25 Apr 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
2.0K
16,566
0
27 Feb 2023
Symbolic Discovery of Optimization Algorithms
Neural Information Processing Systems (NeurIPS), 2023
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
435
476
0
13 Feb 2023
FP8 Formats for Deep Learning
Paulius Micikevicius
Dusan Stosic
N. Burgess
Marius Cornea
Pradeep Dubey
...
Naveen Mellempudi
S. Oberman
Mohammad Shoeybi
Michael Siu
Hao Wu
BDL
VLM
MQ
488
171
0
12 Sep 2022
On-Device Training Under 256KB Memory
Neural Information Processing Systems (NeurIPS), 2022
Ji Lin
Ligeng Zhu
Wei-Ming Chen
Wei-Chen Wang
Chuang Gan
Song Han
MQ
251
248
0
30 Jun 2022
GACT: Activation Compressed Training for Generic Network Architectures
International Conference on Machine Learning (ICML), 2022
Xiaoxuan Liu
Lianmin Zheng
Yi Xu
Yukuo Cen
Weize Chen
...
Zhiyuan Liu
Jie Tang
Joey Gonzalez
Michael W. Mahoney
Alvin Cheung
VLM
GNN
MQ
189
37
0
22 Jun 2022
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees
Neural Information Processing Systems (NeurIPS), 2022
Jue Wang
Binhang Yuan
Luka Rimanic
Yongjun He
Tri Dao
Beidi Chen
Christopher Ré
Ce Zhang
AI4CE
292
18
0
02 Jun 2022
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Swaroop Mishra
Arindam Mitra
Neeraj Varshney
Bhavdeep Singh Sachdeva
Peter Clark
Chitta Baral
Ashwin Kalyan
AIMat
ReLM
ELM
LRM
144
119
0
12 Apr 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
605
2,459
0
29 Mar 2022
Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction
International Conference on Machine Learning (ICML), 2022
Georgii Sergeevich Novikov
Daniel Bershatsky
Julia Gusak
Alex Shonenkov
Denis Dimitrov
Ivan Oseledets
MQ
123
18
0
01 Feb 2022
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
...
Mohammad Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
287
796
0
28 Jan 2022
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation
Zechun Liu
Kwang-Ting Cheng
Dong Huang
Eric P. Xing
Zhiqiang Shen
MQ
180
133
0
29 Nov 2021
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
748
6,246
0
27 Oct 2021
8-bit Optimizers via Block-wise Quantization
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
300
355
0
06 Oct 2021
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
International Conference on Machine Learning (ICML), 2021
Jianfei Chen
Lianmin Zheng
Z. Yao
Yi Xu
Ion Stoica
Michael W. Mahoney
Joseph E. Gonzalez
MQ
149
86
0
29 Apr 2021
Are NLP Models really able to Solve Simple Math Word Problems?
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Arkil Patel
S. Bhattamishra
Navin Goyal
ReLM
LRM
242
1,007
0
12 Mar 2021
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
Conference on Machine Learning and Systems (MLSys), 2021
Steve Dai
Rangharajan Venkatesan
Haoxing Ren
B. Zimmer
W. Dally
Brucek Khailany
MQ
159
88
0
08 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
681
2,447
0
31 Dec 2020
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks
Neural Information Processing Systems (NeurIPS), 2020
Jianfei Chen
Yujie Gai
Z. Yao
Michael W. Mahoney
Joseph E. Gonzalez
MQ
101
68
0
27 Oct 2020
TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning
Han Cai
Chuang Gan
Ligeng Zhu
Song Han
150
60
0
22 Jul 2020
GLU Variants Improve Transformer
Noam M. Shazeer
435
1,342
0
12 Feb 2020
Towards Unified INT8 Training for Convolutional Neural Network
Computer Vision and Pattern Recognition (CVPR), 2019
Feng Zhu
Yazhe Niu
F. Yu
Xianglong Liu
Yanfei Wang
Zhelong Li
Xiuqi Yang
Junjie Yan
MQ
181
166
0
29 Dec 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Journal of machine learning research (JMLR), 2019
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
1.3K
22,815
0
23 Oct 2019
1
2
Next