Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2402.14866
Cited By
v1
v2 (latest)
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models
21 February 2024
Ziyi Guan
Hantao Huang
Yupeng Su
Hong Huang
Ngai Wong
Hao Yu
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models"
14 / 14 papers shown
Block Rotation is All You Need for MXFP4 Quantization
Yuantian Shao
Peisong Wang
Yuanteng Chen
Chang Xu
Zhihui Wei
Jian Cheng
408
3
0
06 Nov 2025
Mixed-Precision Quantization for Language Models: Techniques and Prospects
M. Rakka
Marios Fournarakis
Olga Krestinskaya
Jinane Bazzi
K. Salama
Fadi J. Kurdahi
A. Eltawil
M. Fouda
MQ
234
0
0
19 Oct 2025
ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms
Bingxin Xu
Zhen Dong
Oussama Elachqar
Yuzhang Shang
MQ
192
1
0
11 Sep 2025
Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method
Qingcheng Zhu
Yangyang Ren
L. Yang
Mingbao Lin
Yanjing Li
...
Haodong Zhu
Yuguang Yang
Juan Zhang
Runqi Wang
Baochang Zhang
MQ
161
0
0
24 Jul 2025
Radio: Rate-Distortion Optimization for Large Language Model Compression
Sean I. Young
MQ
314
2
0
05 May 2025
Balancing Fidelity and Plasticity: Aligning Mixed-Precision Fine-Tuning with Linguistic Hierarchies
Changhai Zhou
Yuhua Zhou
Qian Qiao
Weizhong Zhang
Cheng Jin
Cheng Jin
Weizhong Zhang
Cheng Jin
MQ
433
2
0
02 May 2025
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
Modeling Decisions for Artificial Intelligence (MDAI), 2025
Deyu Cao
Samin Aref
MQ
478
3
0
14 Apr 2025
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai
Yuma Ichikawa
MQ
417
6
0
13 Apr 2025
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm
Yongyi Yang
Jianyang Gao
Wei Hu
MQ
502
2
0
29 Mar 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
621
6
0
18 Feb 2025
Irrational Complex Rotations Empower Low-bit Optimizers
Zhen Tian
Wayne Xin Zhao
Ji-Rong Wen
MQ
269
0
0
22 Jan 2025
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
632
46
0
06 Oct 2024
LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models
Yupeng Su
Ziyi Guan
Xiaoqun Liu
Tianlai Jin
Dongkuan Wu
Zhengfei Chen
G. Chesi
Ngai Wong
Hao Yu
179
2
0
20 Aug 2024
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang
Haotong Qin
Yangdong Liu
Yawei Li
Qinshuo Liu
Xianglong Liu
Luca Benini
Michele Magno
Shiming Zhang
Xiaojuan Qi
MQ
359
35
0
23 May 2024
1