Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.15346
Cited By
BiSup: Bidirectional Quantization Error Suppression for Large Language Models
24 May 2024
Minghui Zou
Ronghui Guo
Sai Zhang
Xiaowang Zhang
Zhiyong Feng
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BiSup: Bidirectional Quantization Error Suppression for Large Language Models"
2 / 2 papers shown
Title
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
J. Yang
Byeongwook Kim
Jeongin Bae
Beomseok Kwon
Gunho Park
Eunho Yang
S. Kwon
Dongsoo Lee
MQ
34
12
0
28 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
36
30
0
15 Feb 2024
1