ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.19313
  4. Cited By
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
v1v2v3 (latest)

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training

International Conference on Learning Representations (ICLR), 2024
25 October 2024
Haocheng Xi
Han Cai
Ligeng Zhu
Yaojie Lu
Kurt Keutzer
Jianfei Chen
Song Han
    MQ
ArXiv (abs)PDFHTMLHuggingFace (19 upvotes)

Papers citing "COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training"

17 / 67 papers shown
Title
Root Mean Square Layer Normalization
Root Mean Square Layer NormalizationNeural Information Processing Systems (NeurIPS), 2019
Biao Zhang
Rico Sennrich
536
1,073
0
16 Oct 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
676
2,284
0
17 Sep 2019
A Study of BFLOAT16 for Deep Learning Training
A Study of BFLOAT16 for Deep Learning Training
Dhiraj D. Kalamkar
Dheevatsa Mudigere
Naveen Mellempudi
Dipankar Das
K. Banerjee
...
Sudarshan Srinivasan
Abhisek Kundu
M. Smelyanskiy
Bharat Kaul
Pradeep Dubey
MQ
234
399
0
29 May 2019
HellaSwag: Can a Machine Really Finish Your Sentence?
HellaSwag: Can a Machine Really Finish Your Sentence?Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
461
3,242
0
19 May 2019
Towards VQA Models That Can Read
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
489
1,581
0
18 Apr 2019
Training Deep Neural Networks with 8-bit Floating Point Numbers
Training Deep Neural Networks with 8-bit Floating Point Numbers
Naigang Wang
Jungwook Choi
D. Brand
Chia-Yu Chen
K. Gopalakrishnan
MQ
170
536
0
19 Dec 2018
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Noam M. Shazeer
Mitchell Stern
ODL
335
1,157
0
11 Apr 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning
  Challenge
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELMRALMLRM
706
3,471
0
14 Mar 2018
VizWiz Grand Challenge: Answering Visual Questions from Blind People
VizWiz Grand Challenge: Answering Visual Questions from Blind People
Danna Gurari
Qing Li
Abigale Stangl
Anhong Guo
Chi Lin
Kristen Grauman
Jiebo Luo
Jeffrey P. Bigham
CoGe
552
1,024
0
22 Feb 2018
Mixed Precision Training
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
348
2,043
0
10 Oct 2017
Crowdsourcing Multiple Choice Science Questions
Crowdsourcing Multiple Choice Science Questions
Johannes Welbl
Nelson F. Liu
Matt Gardner
AI4Ed
212
654
0
19 Jul 2017
Balanced Quantization: An Effective and Efficient Approach to Quantized
  Neural Networks
Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks
Shuchang Zhou
Yuzhi Wang
He Wen
Qinyao He
Yuheng Zou
MQ
153
113
0
22 Jun 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
880
3,671
0
02 Dec 2016
Pointer Sentinel Mixture Models
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
899
3,340
0
26 Sep 2016
Layer Normalization
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
573
11,493
0
21 Jul 2016
Gaussian Error Linear Units (GELUs)
Gaussian Error Linear Units (GELUs)
Dan Hendrycks
Kevin Gimpel
431
5,864
0
27 Jun 2016
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic OptimizationInternational Conference on Learning Representations (ICLR), 2014
Diederik P. Kingma
Jimmy Ba
ODL
4.2K
158,837
0
22 Dec 2014
Previous
12