Efficient Post-training Quantization with FP8 Formats

26 September 2023

Papers citing "Efficient Post-training Quantization with FP8 Formats"

6 / 6 papers shown

Title
Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models Lucas Maisonnave Cyril Moineau Olivier Bichler Fabrice Rastello MQ 69 1 0 30 Apr 2025
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance Jaskirat Singh Bram Adams Ahmed E. Hassan VLM 29 0 0 01 Nov 2024
Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs Jordan Dotzel Yuzong Chen Bahaa Kotb Sushma Prasad Gang Wu Sheng R. Li Mohamed S. Abdelfattah Zhiru Zhang 21 7 0 06 May 2024
FP8 Formats for Deep Learning Paulius Micikevicius Dusan Stosic N. Burgess Marius Cornea Pradeep Dubey ... Naveen Mellempudi S. Oberman M. Shoeybi Michael Siu Hao Wu BDL VLM MQ 67 119 0 12 Sep 2022
I-BERT: Integer-only BERT Quantization Sehoon Kim A. Gholami Z. Yao Michael W. Mahoney Kurt Keutzer MQ 86 332 0 05 Jan 2021
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights Aojun Zhou Anbang Yao Yiwen Guo Lin Xu Yurong Chen MQ 300 1,046 0 10 Feb 2017