Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.13304
Cited By
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
25 July 2023
Jerry Chee
Yaohui Cai
Volodymyr Kuleshov
Chris De Sa
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
50 / 150 papers shown
Title
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
Jinuk Kim
Marwa El Halabi
W. Park
Clemens JS Schaefer
Deokjae Lee
Yeonhong Park
Jae W. Lee
Hyun Oh Song
MQ
19
0
0
11 May 2025
Radio: Rate-Distortion Optimization for Large Language Model Compression
Sean I. Young
MQ
17
0
0
05 May 2025
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo
Chenghao Qiu
Maojiang Su
Zhihan Zhou
Zoe Mehta
Guo Ye
Jerry Yao-Chieh Hu
Han Liu
AAML
55
0
0
01 May 2025
ICQuant: Index Coding enables Low-bit LLM Quantization
Xinlin Li
Osama A. Hanna
Christina Fragouli
Suhas Diggavi
MQ
58
0
0
01 May 2025
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
A. Zandieh
Majid Daliri
Majid Hadian
Vahab Mirrokni
MQ
74
0
0
28 Apr 2025
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
Zhenyu (Allen) Zhang
Zechun Liu
Yuandong Tian
Harshit Khaitan
Z. Wang
Steven Li
57
0
0
28 Apr 2025
Compute-Optimal LLMs Provably Generalize Better With Scale
Marc Finzi
Sanyam Kapoor
Diego Granziol
Anming Gu
Christopher De Sa
J. Zico Kolter
Andrew Gordon Wilson
26
0
0
21 Apr 2025
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
Lawrence Liu
Inesh Chakrabarti
Yixiao Li
Mengdi Wang
Tuo Zhao
Lin F. Yang
MQ
23
0
0
20 Apr 2025
Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs
Lucas Maisonnave
Cyril Moineau
Olivier Bichler
Fabrice Rastello
MQ
37
0
0
18 Apr 2025
Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Yaoyao Ding
Bohan Hou
X. Zhang
Allan Lin
Tianqi Chen
Cody Yu Hao
Yida Wang
Gennady Pekhimenko
39
0
0
17 Apr 2025
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai
Yuma Ichikawa
MQ
29
0
0
13 Apr 2025
GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration
Yuhang Li
Ruokai Yin
Donghyun Lee
Shiting Xiao
Priyadarshini Panda
MQ
45
0
0
03 Apr 2025
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm
Yongyi Yang
Jianyang Gao
Wei Hu
MQ
31
1
0
29 Mar 2025
QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition
Yuxuan Hu
Xiaodong Chen
C. Li
H. Chen
J. Zhang
MQ
60
0
0
25 Mar 2025
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Minsu Kim
Seongmin Hong
RyeoWook Ko
S. Choi
Hunjong Lee
Junsoo Kim
J. Kim
Jongse Park
52
0
0
24 Mar 2025
Improving Quantization with Post-Training Model Expansion
Giuseppe Franco
Pablo Monteagudo-Lago
Ian Colbert
Nicholas J. Fraser
Michaela Blott
MQ
57
1
0
21 Mar 2025
PARQ: Piecewise-Affine Regularized Quantization
Lisa Jin
Jianhao Ma
Zechun Liu
Andrey Gromov
Aaron Defazio
Lin Xiao
MQ
33
0
0
19 Mar 2025
ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning
Baohao Liao
Christian Herold
Seyyed Hadi Hashemi
Stefan Vasilev
Shahram Khadivi
Christof Monz
MQ
44
0
0
17 Mar 2025
ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts
E. Georganas
Dhiraj D. Kalamkar
Alexander Kozlov
A. Heinecke
MQ
48
0
0
17 Mar 2025
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Mohsen Gholami
Mohammad Akbari
Kevin Cannons
Yong Zhang
63
0
0
07 Mar 2025
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
Jinguang Wang
J. Wang
Haifeng Sun
Tingting Yang
Zirui Zhuang
Wanyi Ning
Yuexi Yin
Q. Qi
Jianxin Liao
MQ
MoMe
44
0
0
07 Mar 2025
RSQ: Learning from Important Tokens Leads to Better Quantized LLMs
Yi-Lin Sung
Prateek Yadav
Jialu Li
Jaehong Yoon
Mohit Bansal
MQ
52
1
0
03 Mar 2025
Identifying Sensitive Weights via Post-quantization Integral
Yuezhou Hu
Weiyu Huang
Zichen Liang
C. L. P. Chen
Jintao Zhang
J. Zhu
Jianfei Chen
MQ
37
2
0
28 Feb 2025
Binary Neural Networks for Large Language Model: A Survey
Liangdong Liu
Zhitong Zheng
Cong Wang
Tianhuang Su
Z. Yang
MQ
65
0
0
26 Feb 2025
SpinQuant: LLM quantization with learned rotations
Zechun Liu
Changsheng Zhao
Igor Fedorov
Bilge Soran
Dhruv Choudhary
Raghuraman Krishnamoorthi
Vikas Chandra
Yuandong Tian
Tijmen Blankevoort
MQ
127
79
0
21 Feb 2025
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models
Artyom Kharinaev
Viktor Moskvoretskii
Egor Shvetsov
Kseniia Studenikina
Bykov Mikhail
E. Burnaev
MQ
38
0
0
18 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
J. Zhao
M. Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
56
1
0
18 Feb 2025
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models
J. Zhao
Miao Zhang
M. Wang
Yuzhang Shang
Kaihao Zhang
Weili Guan
Yaowei Wang
Min Zhang
MQ
41
0
0
18 Feb 2025
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
J. Wang
Hansong Zhou
Ting Song
Shijie Cao
Yan Xia
Ting Cao
Jianyu Wei
Shuming Ma
Hongyu Wang
Furu Wei
56
0
0
17 Feb 2025
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
Rishabh Tiwari
Haocheng Xi
Aditya Tomar
Coleman Hooper
Sehoon Kim
Maxwell Horton
Mahyar Najibi
Michael W. Mahoney
K. K.
Amir Gholami
MQ
43
1
0
05 Feb 2025
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu
Changsheng Zhao
Hanxian Huang
Sijia Chen
Jing Zhang
...
Yuandong Tian
Bilge Soran
Raghuraman Krishnamoorthi
Tijmen Blankevoort
Vikas Chandra
MQ
73
3
0
04 Feb 2025
Progressive Binarization with Semi-Structured Pruning for LLMs
X. Yan
Tianao Zhang
Zhiteng Li
Yulun Zhang
MQ
54
0
0
03 Feb 2025
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
Xing Hu
Yuan Cheng
Dawei Yang
Zukang Xu
Zhihang Yuan
Jiangyong Yu
Chen Xu
Zhe Jiang
Sifan Zhou
MQ
36
4
0
23 Jan 2025
GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation
Shashikant Ilager
Lukas Florian Briem
Ivona Brandić
29
0
0
19 Jan 2025
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator
Guoyu Li
Shengyu Ye
C. L. P. Chen
Yang Wang
Fan Yang
Ting Cao
Cheng Liu
Mohamed M. Sabry
Mao Yang
MQ
57
0
0
18 Jan 2025
Rethinking Post-Training Quantization: Introducing a Statistical Pre-Calibration Approach
Alireza Ghaffari
Sharareh Younesian
Boxing Chen
Vahid Partovi Nia
M. Asgharian
MQ
51
0
0
17 Jan 2025
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory
Jerry Chee
A. Backurs
Rainie Heck
Li Zhang
Janardhan Kulkarni
Thomas Rothvoss
Sivakanth Gopi
MQ
49
0
0
11 Jan 2025
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensation
Y. Park
Jake Hyun
Hojoon Kim
Jae W. Lee
MQ
36
0
0
31 Dec 2024
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
Chao Zeng
Songwei Liu
Shu Yang
Fangmin Chen
Xing Mei
Lean Fu
MQ
38
0
0
23 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
112
1
0
18 Dec 2024
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
Vladimir Malinovskii
Andrei Panferov
Ivan Ilin
Han Guo
Peter Richtárik
Dan Alistarh
MQ
78
6
0
26 Nov 2024
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
Chao Fang
Man Shi
Robin Geens
Arne Symons
Zhongfeng Wang
Marian Verhelst
69
0
0
24 Nov 2024
Bi-Mamba: Towards Accurate 1-Bit State Space Models
Shengkun Tang
Liqun Ma
H. Li
Mingjie Sun
Zhiqiang Shen
Mamba
70
3
0
18 Nov 2024
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Yuzong Chen
Ahmed F. AbouElhamayed
Xilai Dai
Yang Wang
Marta Andronic
G. Constantinides
Mohamed S. Abdelfattah
MQ
95
0
0
18 Nov 2024
The Super Weight in Large Language Models
Mengxia Yu
De Wang
Qi Shan
Colorado Reed
Alvin Wan
MQ
MILM
29
9
0
11 Nov 2024
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
C. Pehlevan
Christopher Ré
Aditi Raghunathan
AIFin
MoMe
46
12
0
07 Nov 2024
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Yang Yue
Yulin Wang
Bingyi Kang
Yizeng Han
Shenzhi Wang
Shiji Song
Jiashi Feng
Gao Huang
VLM
38
16
0
04 Nov 2024
A Comprehensive Study on Quantization Techniques for Large Language Models
Jiedong Lang
Zhehao Guo
Shuyu Huang
MQ
31
9
0
30 Oct 2024
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
Yuhang Li
Priyadarshini Panda
MQ
26
1
0
24 Oct 2024
Pyramid Vector Quantization for LLMs
Tycho F. A. van der Ouderaa
Maximilian L. Croci
Agrin Hilmkil
James Hensman
MQ
26
0
0
22 Oct 2024
1
2
3
Next