v1v2v3 (latest)

Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

18 April 2023

Yuhang Li

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (46★)

Papers citing "Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling"

25 / 25 papers shown

Energy-Efficient and Dequantization-Free Q-LLMs: A Spiking Neural Network Approach to Salient Value Mitigation

152

22 Oct 2025

Mixed-Precision Quantization for Language Models: Techniques and Prospects

227

19 Oct 2025

Interpreting the Effects of Quantization on LLMs

Manpreet Singh

Hassan Sajjad

MQ MILM

377

22 Aug 2025

Neural Network Quantization for Microcontrollers: A Comprehensive Survey of Methods, Platforms, and Applications

368

20 Aug 2025

Why Do Some Inputs Break Low-Bit LLM Quantization?

259

24 May 2025

Fast and Low-Cost Genomic Foundation Models via Outlier Removal

440

01 May 2025

MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration

199

07 Mar 2025

Compressing Language Models for Specialized Domains

300

25 Feb 2025

Deploying Foundation Model Powered Agent Services: A Survey

...

471

18 Dec 2024

The Super Weight in Large Language Models

332

11 Nov 2024

FlatQuant: Flatness Matters for LLM Quantization

...

586

12 Oct 2024

MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute TransformationEuropean Conference on Computer Vision (ECCV), 2024

Zhi Wang

263

15 Sep 2024

LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

344

16 Jul 2024

OutlierTune: Efficient Channel-Wise Quantization for Large Language Models

Qi Qi

Jianxin Liao

187

27 Jun 2024

Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization

230

17 Jun 2024

ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models

Jianfei Cai

132

13 Jun 2024

Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs

Jaewoo Yang

Hayun Kim

Younghoon Kim

207

23 May 2024

SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

Dahua Lin

272

10 May 2024

Cherry on Top: Parameter Heterogeneity and Quantization in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

Wanyun Cui

Qianle Wang

185

03 Apr 2024

Minimize Quantization Output Error with Bias Compensation

148

02 Apr 2024

QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning

258

11 Mar 2024

A Comprehensive Evaluation of Quantization Strategies for Large Language Models

287

26 Feb 2024

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

504

11 Sep 2023

A Survey on Model Compression for Large Language ModelsTransactions of the Association for Computational Linguistics (TACL), 2023

Jian Li

378

352

15 Aug 2023

AWQ: Activation-aware Weight Quantization for LLM Compression and AccelerationConference on Machine Learning and Systems (MLSys), 2023

Chuang Gan

Song Han

EDL MQ

831

946

01 Jun 2023