Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.10960
Cited By
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
20 January 2025
Han Guo
William Brandon
Radostin Cholakov
Jonathan Ragan-Kelley
Eric P. Xing
Yoon Kim
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fast Matrix Multiplications for Lookup Table-Quantized LLMs"
13 / 13 papers shown
Title
BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache
Dayou Du
Shijie Cao
Jianyi Cheng
Ting Cao
M. Yang
MQ
61
0
0
24 Mar 2025
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensation
Y. Park
Jake Hyun
Hojoon Kim
Jae W. Lee
MQ
33
0
0
31 Dec 2024
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
Chao Zeng
Songwei Liu
Shu Yang
Fangmin Chen
Xing Mei
Lean Fu
MQ
38
0
0
23 Dec 2024
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
Vladimir Malinovskii
Andrei Panferov
Ivan Ilin
Han Guo
Peter Richtárik
Dan Alistarh
MQ
78
6
0
26 Nov 2024
An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2
Pepijn de Reus
Ana Oprescu
Jelle Zuidema
MQ
67
1
0
15 Nov 2024
An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks
Mohsen Dehghankar
Mahdi Erfanian
Abolfazl Asudeh
30
0
0
10 Nov 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
50
13
0
06 Oct 2024
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models
Bowen Ping
Shuo Wang
Hanqing Wang
Xu Han
Yuzhuang Xu
Yukun Yan
Yun Chen
Baobao Chang
Zhiyuan Liu
Maosong Sun
MQ
41
4
0
13 Jun 2024
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Yujun Lin
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
77
71
0
07 May 2024
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Aniruddha Nrusimha
Mayank Mishra
Naigang Wang
Dan Alistarh
Rameswar Panda
Yoon Kim
MQ
54
8
0
04 Apr 2024
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
Albert Tseng
Jerry Chee
Qingyao Sun
Volodymyr Kuleshov
Christopher De Sa
MQ
120
91
0
06 Feb 2024
Extreme Compression of Large Language Models via Additive Quantization
Vage Egiazarian
Andrei Panferov
Denis Kuznedelev
Elias Frantar
Artem Babenko
Dan Alistarh
MQ
98
87
0
11 Jan 2024
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
Aojun Zhou
Anbang Yao
Yiwen Guo
Lin Xu
Yurong Chen
MQ
300
1,046
0
10 Feb 2017
1