ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.14592
  4. Cited By
Efficient Post-training Quantization with FP8 Formats
v1v2 (latest)

Efficient Post-training Quantization with FP8 Formats

Conference on Machine Learning and Systems (MLSys), 2023
26 September 2023
Haihao Shen
Naveen Mellempudi
Xin He
Q. Gao
Chang‐Bao Wang
Mengni Wang
    MQ
ArXiv (abs)PDFHTMLHuggingFace (11 upvotes)Github (2414★)

Papers citing "Efficient Post-training Quantization with FP8 Formats"

14 / 14 papers shown
Title
BitSnap: Checkpoint Sparsification and Quantization in LLM Training
BitSnap: Checkpoint Sparsification and Quantization in LLM Training
Yanxin Peng
Qingping Li
Baodong Wu
Shigang Li
Guohao Dai
Shengen Yan
Yu Wang
MQ
233
0
0
15 Nov 2025
Reliable Evaluation Protocol for Low-Precision Retrieval
Reliable Evaluation Protocol for Low-Precision Retrieval
Kisu Yang
Yoonna Jang
Hwanseok Jang
Kenneth Choi
Isabelle Augenstein
Heuiseok Lim
88
0
0
05 Aug 2025
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
Akide Liu
Zeyu Zhang
Zhexin Li
Xuehai Bai
Yizeng Han
...
Jiahao He
Yuanyu He
F. Wang
Gholamreza Haffari
Bohan Zhuang
VGenMQ
442
8
0
05 Jun 2025
Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models
Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models
Lucas Maisonnave
Cyril Moineau
Olivier Bichler
Fabrice Rastello
MQ
357
1
0
30 Apr 2025
Optimization of embeddings storage for RAG systems using quantization and dimensionality reduction techniques
Optimization of embeddings storage for RAG systems using quantization and dimensionality reduction techniques
Naamán Huerga-Pérez
Rubén Álvarez
Rubén Ferrero-Guillén
Alberto Martínez-Gutiérrez
Javier Díez-González
MQ
100
0
0
30 Apr 2025
FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers
FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers
Ruichen Chen
Keith G. Mills
Di Niu
MQ
278
2
0
19 Mar 2025
Optimizing Singular Spectrum for Large Language Model Compression
Dengjie Li
Tiancheng Shen
Yao Zhou
Baisong Yang
Zhongying Liu
Masheng Yang
Guohao Li
Jianlong Wu
Yujie Zhong
Ming-Hsuan Yang
170
3
0
24 Feb 2025
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
338
1
0
01 Nov 2024
Accelerating Communication in Deep Learning Recommendation Model
  Training with Dual-Level Adaptive Lossy Compression
Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression
Hao Feng
Boyuan Zhang
Fanjiang Ye
Min Si
Ching-Hsiang Chu
...
Summer Deng
Yuchen Hao
Pavan Balaji
Tong Geng
Dingwen Tao
AI4CE
145
5
0
05 Jul 2024
Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point
Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point
Bokun Wang
Axel Berg
D. A. E. Acar
Chuteng Zhou
MQFedML
333
1
0
02 Jul 2024
Learning from Students: Applying t-Distributions to Explore Accurate and
  Efficient Formats for LLMs
Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMsInternational Conference on Machine Learning (ICML), 2024
Jordan Dotzel
Yuzong Chen
Bahaa Kotb
Sushma Prasad
Gang Wu
Sheng Li
Mohamed S. Abdelfattah
Zhiru Zhang
259
16
0
06 May 2024
Local Masking Meets Progressive Freezing: Crafting Efficient Vision
  Transformers for Self-Supervised Learning
Local Masking Meets Progressive Freezing: Crafting Efficient Vision Transformers for Self-Supervised LearningInternational Conference on Machine Vision (ICMV), 2023
Utku Mert Topcuoglu
Erdem Akagündüz
206
2
0
02 Dec 2023
Efficient LLM Inference on CPUs
Efficient LLM Inference on CPUs
Haihao Shen
Hanwen Chang
Bo Dong
Yu Luo
Hengyu Meng
MQ
193
30
0
01 Nov 2023
How to Capture Higher-order Correlations? Generalizing Matrix Softmax
  Attention to Kronecker Computation
How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker ComputationInternational Conference on Learning Representations (ICLR), 2023
Josh Alman
Zhao Song
307
44
0
06 Oct 2023
1