Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.16406
Cited By
SpinQuant: LLM quantization with learned rotations
21 February 2025
Zechun Liu
Changsheng Zhao
Igor Fedorov
Bilge Soran
Dhruv Choudhary
Raghuraman Krishnamoorthi
Vikas Chandra
Yuandong Tian
Tijmen Blankevoort
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SpinQuant: LLM quantization with learned rotations"
50 / 57 papers shown
Title
Diffusion Model Quantization: A Review
Qian Zeng
Chenggong Hu
Mingli Song
Jie Song
MQ
36
0
0
08 May 2025
An Empirical Study of Qwen3 Quantization
Xingyu Zheng
Yuye Li
Haoran Chu
Yue Feng
Xudong Ma
Jie Luo
Jinyang Guo
Haotong Qin
Michele Magno
Xianglong Liu
MQ
24
0
0
04 May 2025
Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free
Euntae Choi
Sumin Song
Woosang Lim
Sungjoo Yoo
17
0
0
02 May 2025
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization
Chen Xu
Yuxuan Yue
Zukang Xu
Xing Hu
Jiangyong Yu
Zhixuan Chen
Sifan Zhou
Zhihang Yuan
Dawei Yang
MQ
15
0
0
02 May 2025
ICQuant: Index Coding enables Low-bit LLM Quantization
Xinlin Li
Osama A. Hanna
Christina Fragouli
Suhas Diggavi
MQ
45
0
0
01 May 2025
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
Hongyu Wang
Shuming Ma
Furu Wei
MQ
37
1
0
25 Apr 2025
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores
Fengwei Zhou
Jiafei Song
Wenjin Jason Li
Gengjian Xue
Zhikang Zhao
Yichao Lu
Bailin Na
12
0
0
23 Apr 2025
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
Coleman Hooper
Charbel Sakr
Ben Keller
Rangharajan Venkatesan
Kurt Keutzer
S.
Brucek Khailany
MQ
37
0
0
19 Apr 2025
Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs
Lucas Maisonnave
Cyril Moineau
Olivier Bichler
Fabrice Rastello
MQ
32
0
0
18 Apr 2025
Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Yaoyao Ding
Bohan Hou
X. Zhang
Allan Lin
Tianqi Chen
Cody Yu Hao
Yida Wang
Gennady Pekhimenko
34
0
0
17 Apr 2025
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
Avinash Kumar
Shashank Nag
Jason Clemons
L. John
Poulami Das
24
0
0
14 Apr 2025
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai
Yuma Ichikawa
MQ
19
0
0
13 Apr 2025
Achieving binary weight and activation for LLMs using Post-Training Quantization
Siqing Song
Chuang Wang
Ruiqi Wang
Yi Yang
Xuyao Zhang
MQ
16
0
0
07 Apr 2025
GPTQv2: Efficient Finetuning-Free Quantization for Asymmetric Calibration
Yuhang Li
Ruokai Yin
Donghyun Lee
Shiting Xiao
Priyadarshini Panda
MQ
37
0
0
03 Apr 2025
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm
Yongyi Yang
Jianyang Gao
Wei Hu
MQ
21
1
0
29 Mar 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
41
0
0
28 Mar 2025
QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition
Yuxuan Hu
Xiaodong Chen
C. Li
H. Chen
J. Zhang
MQ
56
0
0
25 Mar 2025
Improving Quantization with Post-Training Model Expansion
Giuseppe Franco
Pablo Monteagudo-Lago
Ian Colbert
Nicholas J. Fraser
Michaela Blott
MQ
49
1
0
21 Mar 2025
SplitQuantV2: Enhancing Low-Bit Quantization of LLMs Without GPUs
Jaewoo Song
Fangzhen Lin
MQ
38
0
0
07 Mar 2025
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
Jinguang Wang
J. Wang
Haifeng Sun
Tingting Yang
Zirui Zhuang
Wanyi Ning
Yuexi Yin
Q. Qi
Jianxin Liao
MQ
MoMe
38
0
0
07 Mar 2025
Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size
Alireza Behtash
Marijan Fofonjka
Ethan Baird
Tyler Mauer
Hossein Moghimifam
David Stout
Joel Dennison
MQ
45
1
0
06 Mar 2025
KurTail : Kurtosis-based LLM Quantization
Mohammad Sadegh Akhondzadeh
Aleksandar Bojchevski
E. Eleftheriou
M. Dazzi
MQ
33
0
0
03 Mar 2025
RSQ: Learning from Important Tokens Leads to Better Quantized LLMs
Yi-Lin Sung
Prateek Yadav
Jialu Li
Jaehong Yoon
Mohit Bansal
MQ
47
1
0
03 Mar 2025
Identifying Sensitive Weights via Post-quantization Integral
Yuezhou Hu
Weiyu Huang
Zichen Liang
C. L. P. Chen
Jintao Zhang
J. Zhu
Jianfei Chen
MQ
34
2
0
28 Feb 2025
LightMamba: Efficient Mamba Acceleration on FPGA with Quantization and Hardware Co-design
Renjie Wei
Songqiang Xu
Linfeng Zhong
Zebin Yang
Qingyu Guo
Y. Wang
Runsheng Wang
Meng Li
71
0
0
24 Feb 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
Sifan Zhou
Shuo Wang
Zhihang Yuan
Mingjia Shi
Yuzhang Shang
Dawei Yang
ALM
MQ
77
0
0
18 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
J. Zhao
M. Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
48
1
0
18 Feb 2025
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu
Changsheng Zhao
Hanxian Huang
Sijia Chen
Jing Zhang
...
Yuandong Tian
Bilge Soran
Raghuraman Krishnamoorthi
Tijmen Blankevoort
Vikas Chandra
MQ
68
2
0
04 Feb 2025
Optimizing Large Language Model Training Using FP4 Quantization
Ruizhe Wang
Yeyun Gong
Xiao Liu
Guoshuai Zhao
Ziyue Yang
Baining Guo
Zhengjun Zha
Peng Cheng
MQ
59
4
0
28 Jan 2025
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization
Mengzhao Chen
Yi Liu
Jiahao Wang
Yi Bin
Wenqi Shao
Ping Luo
MQ
56
1
0
28 Jan 2025
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
Xing Hu
Yuan Cheng
Dawei Yang
Zukang Xu
Zhihang Yuan
Jiangyong Yu
Chen Xu
Zhe Jiang
Sifan Zhou
MQ
28
4
0
23 Jan 2025
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory
Jerry Chee
A. Backurs
Rainie Heck
Li Zhang
Janardhan Kulkarni
Thomas Rothvoss
Sivakanth Gopi
MQ
35
0
0
11 Jan 2025
MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion Models
Weilun Feng
Haotong Qin
Chuanguang Yang
Zhulin An
Libo Huang
Boyu Diao
Fei Wang
Renshuai Tao
Y. Xu
Michele Magno
DiffM
MQ
75
4
0
16 Dec 2024
HadaCore: Tensor Core Accelerated Hadamard Transform Kernel
Krish Agarwal
Rishi Astra
Adnan Hoque
M. Srivatsa
R. Ganti
Less Wright
Sijia Chen
72
0
0
12 Dec 2024
Energy-Efficient Split Learning for Fine-Tuning Large Language Models in Edge Networks
Zuguang Li
Shaohua Wu
Liang Li
Songge Zhang
90
0
0
27 Nov 2024
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
Vladimir Malinovskii
Andrei Panferov
Ivan Ilin
Han Guo
Peter Richtárik
Dan Alistarh
MQ
68
6
0
26 Nov 2024
SAM Decoding: Speculative Decoding via Suffix Automaton
Yuxuan Hu
Ke Wang
Jing Zhang
Fanjin Zhang
C. Li
H. Chen
Jing Zhang
36
1
0
16 Nov 2024
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
Janghwan Lee
Jiwoong Park
Jinseok Kim
Yongjik Kim
Jungju Oh
Jinwook Oh
Jungwook Choi
37
2
0
15 Nov 2024
ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization
Weibo Zhao
Yubin Shi
Xinyu Lyu
Wanchen Sui
Shen Li
Yong Li
MQ
37
1
0
12 Nov 2024
The Super Weight in Large Language Models
Mengxia Yu
De Wang
Qi Shan
Colorado Reed
Alvin Wan
MQ
MILM
24
2
0
11 Nov 2024
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Nai Chit Fung
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
26
0
0
28 Oct 2024
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
Yuhang Li
Priyadarshini Panda
MQ
16
1
0
24 Oct 2024
COMET: Towards Partical W4A4KV4 LLMs Serving
Lian Liu
Haimeng Ren
Long Cheng
Zhaohui Xu
Yudong Pan
Mengdi Wang
Xiaowei Li
Yinhe Han
Ying Wang
MQ
13
0
0
16 Oct 2024
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun
Ruikang Liu
Haoli Bai
Han Bao
Kang Zhao
...
Lu Hou
Chun Yuan
Xin Jiang
W. Liu
Jun Yao
MQ
44
3
0
12 Oct 2024
Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference
Ke Yi
Zengke Liu
Jianwei Zhang
Chengyuan Li
Tong Zhang
Junyang Lin
Jingren Zhou
MQ
33
0
0
30 Sep 2024
Accumulator-Aware Post-Training Quantization
Ian Colbert
Fabian Grob
Giuseppe Franco
Jinjie Zhang
Rayan Saab
MQ
14
3
0
25 Sep 2024
Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview
Yanshu Wang
Tong Yang
Xiyan Liang
Guoan Wang
Hanning Lu
Xu Zhe
Yaoming Li
Li Weitao
MQ
21
1
0
18 Sep 2024
RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
Xijie Huang
Zechun Liu
Shih-yang Liu
Kwang-Ting Cheng
MQ
29
3
0
10 Jul 2024
VcLLM: Video Codecs are Secretly Tensor Codecs
Ceyu Xu
Yongji Wu
Xinyu Yang
Beidi Chen
Matthew Lentz
Danyang Zhuo
Lisa Wu Wills
37
0
0
29 Jun 2024
BoA: Attention-aware Post-training Quantization without Backpropagation
Junhan Kim
Ho-Young Kim
Eulrang Cho
Chungman Lee
Joonyoung Kim
Yongkweon Jeon
MQ
30
0
0
19 Jun 2024
1
2
Next