ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.14866
  4. Cited By
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for
  Large Language Models
v1v2 (latest)

APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models

21 February 2024
Ziyi Guan
Hantao Huang
Yupeng Su
Hong Huang
Ngai Wong
Hao Yu
    MQ
ArXiv (abs)PDFHTML

Papers citing "APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models"

14 / 14 papers shown
Block Rotation is All You Need for MXFP4 Quantization
Block Rotation is All You Need for MXFP4 Quantization
Yuantian Shao
Peisong Wang
Yuanteng Chen
Chang Xu
Zhihui Wei
Jian Cheng
408
3
0
06 Nov 2025
Mixed-Precision Quantization for Language Models: Techniques and Prospects
Mixed-Precision Quantization for Language Models: Techniques and Prospects
M. Rakka
Marios Fournarakis
Olga Krestinskaya
Jinane Bazzi
K. Salama
Fadi J. Kurdahi
A. Eltawil
M. Fouda
MQ
234
0
0
19 Oct 2025
ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms
ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms
Bingxin Xu
Zhen Dong
Oussama Elachqar
Yuzhang Shang
MQ
192
1
0
11 Sep 2025
Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method
Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method
Qingcheng Zhu
Yangyang Ren
L. Yang
Mingbao Lin
Yanjing Li
...
Haodong Zhu
Yuguang Yang
Juan Zhang
Runqi Wang
Baochang Zhang
MQ
161
0
0
24 Jul 2025
Radio: Rate-Distortion Optimization for Large Language Model Compression
Radio: Rate-Distortion Optimization for Large Language Model Compression
Sean I. Young
MQ
314
2
0
05 May 2025
Balancing Fidelity and Plasticity: Aligning Mixed-Precision Fine-Tuning with Linguistic Hierarchies
Balancing Fidelity and Plasticity: Aligning Mixed-Precision Fine-Tuning with Linguistic Hierarchies
Changhai Zhou
Yuhua Zhou
Qian Qiao
Weizhong Zhang
Cheng Jin
Cheng Jin
Weizhong Zhang
Cheng Jin
MQ
433
2
0
02 May 2025
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial RetrainingModeling Decisions for Artificial Intelligence (MDAI), 2025
Deyu Cao
Samin Aref
MQ
478
3
0
14 Apr 2025
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai
Yuma Ichikawa
MQ
417
6
0
13 Apr 2025
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm
Yongyi Yang
Jianyang Gao
Wei Hu
MQ
502
2
0
29 Mar 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
621
6
0
18 Feb 2025
Irrational Complex Rotations Empower Low-bit Optimizers
Irrational Complex Rotations Empower Low-bit Optimizers
Zhen Tian
Wayne Xin Zhao
Ji-Rong Wen
MQ
269
0
0
22 Jan 2025
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
632
46
0
06 Oct 2024
LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models
LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models
Yupeng Su
Ziyi Guan
Xiaoqun Liu
Tianlai Jin
Dongkuan Wu
Zhengfei Chen
G. Chesi
Ngai Wong
Hao Yu
179
2
0
20 Aug 2024
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang
Haotong Qin
Yangdong Liu
Yawei Li
Qinshuo Liu
Xianglong Liu
Luca Benini
Michele Magno
Shiming Zhang
Xiaojuan Qi
MQ
359
35
0
23 May 2024
1