ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.08011
  4. Cited By
Training Deep Neural Networks with 8-bit Floating Point Numbers

Training Deep Neural Networks with 8-bit Floating Point Numbers

19 December 2018
Naigang Wang
Jungwook Choi
D. Brand
Chia-Yu Chen
K. Gopalakrishnan
    MQ
ArXiv (abs)PDFHTML

Papers citing "Training Deep Neural Networks with 8-bit Floating Point Numbers"

50 / 212 papers shown
Title
Better Schedules for Low Precision Training of Deep Neural Networks
Better Schedules for Low Precision Training of Deep Neural Networks
Cameron R. Wolfe
Anastasios Kyrillidis
124
1
0
04 Mar 2024
Effect of Weight Quantization on Learning Models by Typical Case
  Analysis
Effect of Weight Quantization on Learning Models by Typical Case Analysis
Shuhei Kashiwamura
Ayaka Sakata
Masaaki Imaizumi
MQ
158
3
0
30 Jan 2024
One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware
  Quantization Training
One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware Quantization Training
Lianbo Ma
Yuee Zhou
Jianlun Ma
Guo-Ding Yu
Qing Li
MQ
99
5
0
30 Jan 2024
Towards Cheaper Inference in Deep Networks with Lower Bit-Width
  Accumulators
Towards Cheaper Inference in Deep Networks with Lower Bit-Width AccumulatorsInternational Conference on Learning Representations (ICLR), 2024
Yaniv Blumenfeld
Itay Hubara
Daniel Soudry
146
5
0
25 Jan 2024
Knowledge Translation: A New Pathway for Model Compression
Knowledge Translation: A New Pathway for Model Compression
Wujie Sun
Defang Chen
Jiawei Chen
Yan Feng
Chun-Yen Chen
Can Wang
145
0
0
11 Jan 2024
FP8-BERT: Post-Training Quantization for Transformer
FP8-BERT: Post-Training Quantization for Transformer
Jianwei Li
Tianchi Zhang
Ian En-Hsu Yen
Dongkuan Xu
MQ
163
7
0
10 Dec 2023
Low-Precision Mixed-Computation Models for Inference on Edge
Low-Precision Mixed-Computation Models for Inference on Edge
Seyedarmin Azizi
M. Nazemi
M. Kamal
Massoud Pedram
MQ
166
4
0
03 Dec 2023
Just-in-time Quantization with Processing-In-Memory for Efficient ML
  Training
Just-in-time Quantization with Processing-In-Memory for Efficient ML Training
M. Ibrahim
Shaizeen Aga
Ada Li
Suchita Pati
Mahzabeen Islam
134
8
0
08 Nov 2023
ROAM: memory-efficient large DNN training via optimized operator
  ordering and memory layout
ROAM: memory-efficient large DNN training via optimized operator ordering and memory layout
Huiyao Shu
Ang Wang
Ziji Shi
Hanyu Zhao
Yong Li
Lu Lu
OffRL
108
2
0
30 Oct 2023
FP8-LM: Training FP8 Large Language Models
FP8-LM: Training FP8 Large Language Models
Houwen Peng
Kan Wu
Yixuan Wei
Guoshuai Zhao
Yuxiang Yang
...
Zheng Zhang
Shuguang Liu
Joe Chau
Han Hu
Jun Zhou
MQ
228
61
0
27 Oct 2023
Efficient Post-training Quantization with FP8 Formats
Efficient Post-training Quantization with FP8 FormatsConference on Machine Learning and Systems (MLSys), 2023
Haihao Shen
Naveen Mellempudi
Xin He
Q. Gao
Chang‐Bao Wang
Mengni Wang
MQ
256
34
0
26 Sep 2023
Memory Efficient Mixed-Precision Optimizers
Memory Efficient Mixed-Precision Optimizers
Basile Lewandowski
Atli Kosson
159
2
0
21 Sep 2023
FusionAI: Decentralized Training and Deploying LLMs with Massive
  Consumer-Level GPUs
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs
Zhenheng Tang
Yuxin Wang
Xin He
Longteng Zhang
Xinglin Pan
...
Rongfei Zeng
Kaiyong Zhao
Shaoshuai Shi
Bingsheng He
Xiaowen Chu
159
34
0
03 Sep 2023
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient
  Parameter and Memory
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and MemoryComputer Vision and Pattern Recognition (CVPR), 2023
Haiwen Diao
Bo Wan
Yanzhe Zhang
Xuecong Jia
Huchuan Lu
Long Chen
VLM
157
24
0
28 Aug 2023
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow AvoidanceIEEE International Conference on Computer Vision (ICCV), 2023
Ian Colbert
Alessandro Pappalardo
Jakoba Petri-Koenig
MQ
161
12
0
25 Aug 2023
Tango: rethinking quantization for graph neural network training on GPUs
Tango: rethinking quantization for graph neural network training on GPUsInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023
Shiyang Chen
Da Zheng
Caiwen Ding
Chengying Huan
Yuede Ji
Hang Liu
GNNMQ
126
9
0
02 Aug 2023
Self-Distilled Quantization: Achieving High Compression Rates in
  Transformer-Based Language Models
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
James OÑeill
Sourav Dutta
VLMMQ
132
1
0
12 Jul 2023
Training Transformers with 4-bit Integers
Training Transformers with 4-bit IntegersNeural Information Processing Systems (NeurIPS), 2023
Haocheng Xi
Changhao Li
Jianfei Chen
Jun Zhu
MQ
242
68
0
21 Jun 2023
Breaking On-device Training Memory Wall: A Systematic Survey
Breaking On-device Training Memory Wall: A Systematic Survey
Shitian Li
Chunlin Tian
Kahou Tam
Ruirui Ma
Li Li
141
2
0
17 Jun 2023
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using
  Training Dynamics
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training DynamicsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
A. Ardakani
Altan Haan
Shangyin Tan
Doru-Thom Popovici
Alvin Cheung
Costin Iancu
Koushik Sen
126
7
0
29 May 2023
Multiplication-Free Transformer Training via Piecewise Affine Operations
Multiplication-Free Transformer Training via Piecewise Affine OperationsNeural Information Processing Systems (NeurIPS), 2023
Atli Kosson
Martin Jaggi
183
8
0
26 May 2023
Standalone 16-bit Neural Network Training: Missing Study for Hardware-Limited Deep Learning Practitioners
Standalone 16-bit Neural Network Training: Missing Study for Hardware-Limited Deep Learning Practitioners
Juyoung Yun
Byungkon Kang
Francois Rameau
Zhoulai Fu
Zhoulai Fu
MQ
110
2
0
18 May 2023
Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text
  Sequence-to-Sequence Modeling
Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text Sequence-to-Sequence Modeling
Y. Zhu
Xuebing Yang
Yuanyuan Wu
Wensheng Zhang
MedIm
162
4
0
15 May 2023
Stable and low-precision training for large-scale vision-language models
Stable and low-precision training for large-scale vision-language modelsNeural Information Processing Systems (NeurIPS), 2023
Mitchell Wortsman
Tim Dettmers
Luke Zettlemoyer
Ari S. Morcos
Ali Farhadi
Ludwig Schmidt
MQMLLMVLM
213
64
0
25 Apr 2023
Transformer-based models and hardware acceleration analysis in
  autonomous driving: A survey
Transformer-based models and hardware acceleration analysis in autonomous driving: A survey
J. Zhong
Zheng Liu
Xiangshan Chen
ViT
119
20
0
21 Apr 2023
Outlier Suppression+: Accurate quantization of large language models by
  equivalent and optimal shifting and scaling
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scalingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Xiuying Wei
Yunchen Zhang
Yuhang Li
Xiangguo Zhang
Yazhe Niu
Jian Ren
Zhengang Li
MQ
147
51
0
18 Apr 2023
AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks
AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks
Abhisek Kundu
Naveen Mellempudi
Dharma Teja Vooturi
Bharat Kaul
Pradeep Dubey
147
1
0
14 Apr 2023
FP8 versus INT8 for efficient deep learning inference
FP8 versus INT8 for efficient deep learning inference
M. V. Baalen
Andrey Kuzmin
Suparna S. Nair
Yuwei Ren
E. Mahurin
...
Sundar Subramanian
Sanghyuk Lee
Markus Nagel
Joseph B. Soriaga
Tijmen Blankevoort
MQ
169
53
0
31 Mar 2023
Unit Scaling: Out-of-the-Box Low-Precision Training
Unit Scaling: Out-of-the-Box Low-Precision TrainingInternational Conference on Machine Learning (ICML), 2023
Charlie Blake
Douglas Orr
Carlo Luschi
MQ
111
12
0
20 Mar 2023
Gated Compression Layers for Efficient Always-On Models
Gated Compression Layers for Efficient Always-On Models
Haiguang Li
T. Thormundsson
I. Poupyrev
N. Gillian
146
3
0
15 Mar 2023
Ultra-low Precision Multiplication-free Training for Deep Neural
  Networks
Ultra-low Precision Multiplication-free Training for Deep Neural Networks
Yu Xie
Rui Zhang
Xishan Zhang
Yifan Hao
Zidong Du
Xingui Hu
Ling Li
Qi Guo
MQ
218
2
0
28 Feb 2023
With Shared Microexponents, A Little Shifting Goes a Long Way
With Shared Microexponents, A Little Shifting Goes a Long WayInternational Symposium on Computer Architecture (ISCA), 2023
Bita Darvish Rouhani
Ritchie Zhao
V. Elango
Rasoul Shafipour
Mathew Hall
...
Eric S. Chung
Zhaoxia Deng
S. Naghshineh
Jongsoo Park
Maxim Naumov
MQ
218
60
0
16 Feb 2023
Data Quality-aware Mixed-precision Quantization via Hybrid Reinforcement
  Learning
Data Quality-aware Mixed-precision Quantization via Hybrid Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Yingchun Wang
Jingcai Guo
Song Guo
Weizhan Zhang
MQ
141
23
0
09 Feb 2023
PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning
  Applications
PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning ApplicationsInternational Symposium on Circuits and Systems (ISCAS), 2023
Qiong Li
Chao Fang
Zhongfeng Wang
91
9
0
03 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of TransformersInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
262
64
0
02 Feb 2023
Training with Mixed-Precision Floating-Point Assignments
Training with Mixed-Precision Floating-Point Assignments
Wonyeol Lee
Rahul Sharma
A. Aiken
MQ
119
5
0
31 Jan 2023
Quantized Neural Networks for Low-Precision Accumulation with Guaranteed
  Overflow Avoidance
Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance
Ian Colbert
Alessandro Pappalardo
Jakoba Petri-Koenig
MQ
115
4
0
31 Jan 2023
On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality
On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality
Lu Xia
M. Hochstenbach
Stefano Massei
165
3
0
23 Jan 2023
RedMule: A Mixed-Precision Matrix-Matrix Operation Engine for Flexible
  and Energy-Efficient On-Chip Linear Algebra and TinyML Training Acceleration
RedMule: A Mixed-Precision Matrix-Matrix Operation Engine for Flexible and Energy-Efficient On-Chip Linear Algebra and TinyML Training AccelerationFuture generations computer systems (FGCS), 2023
Yvan Tortorella
L. Bertaccini
Luca Benini
D. Rossi
Francesco Conti
111
24
0
10 Jan 2023
Randomized Quantization: A Generic Augmentation for Data Agnostic
  Self-supervised Learning
Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised LearningIEEE International Conference on Computer Vision (ICCV), 2022
Huimin Wu
Chenyang Lei
Xiao Sun
Pengju Wang
Qifeng Chen
Kwang-Ting Cheng
Stephen Lin
Zhirong Wu
MQ
187
8
0
19 Dec 2022
Numerical Stability of DeepGOPlus Inference
Numerical Stability of DeepGOPlus InferencePLoS ONE (PLoS ONE), 2022
Inés Gonzalez Pepe
Yohan Chatelain
Gregory Kiar
Tristan Glatard
BDL
142
3
0
13 Dec 2022
Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training
Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training
Simla Burcu Harma
Canberk Sonmez
Nicholas Sperry
Babak Falsafi
Martin Jaggi
Yunho Oh
MQ
209
5
0
19 Nov 2022
AskewSGD : An Annealed interval-constrained Optimisation method to train
  Quantized Neural Networks
AskewSGD : An Annealed interval-constrained Optimisation method to train Quantized Neural NetworksInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Louis Leconte
S. Schechtman
Eric Moulines
175
4
0
07 Nov 2022
LightNorm: Area and Energy-Efficient Batch Normalization Hardware for
  On-Device DNN Training
LightNorm: Area and Energy-Efficient Batch Normalization Hardware for On-Device DNN TrainingICCD (ICCD), 2022
Seock-Hwan Noh
Junsang Park
Dahoon Park
Jahyun Koo
Jeik Choi
Jaeha Kung
76
10
0
04 Nov 2022
Emergent Quantized Communication
Emergent Quantized CommunicationAAAI Conference on Artificial Intelligence (AAAI), 2022
Boaz Carmeli
Ron Meir
Yonatan Belinkov
MQAI4CE
188
8
0
04 Nov 2022
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the
  Memory Usage of Neural Networks
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
164
9
0
24 Oct 2022
FP8 Formats for Deep Learning
FP8 Formats for Deep Learning
Paulius Micikevicius
Dusan Stosic
N. Burgess
Marius Cornea
Pradeep Dubey
...
Naveen Mellempudi
S. Oberman
Mohammad Shoeybi
Michael Siu
Hao Wu
BDLVLMMQ
508
172
0
12 Sep 2022
Training a T5 Using Lab-sized Resources
Training a T5 Using Lab-sized Resources
Manuel R. Ciosici
Leon Derczynski
VLM
145
8
0
25 Aug 2022
FP8 Quantization: The Power of the Exponent
FP8 Quantization: The Power of the ExponentNeural Information Processing Systems (NeurIPS), 2022
Andrey Kuzmin
M. V. Baalen
Yuwei Ren
Markus Nagel
Jorn W. T. Peters
Tijmen Blankevoort
MQ
166
105
0
19 Aug 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
338
768
0
15 Aug 2022
Previous
12345
Next