Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.06219
Cited By
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
10 May 2024
Haojie Duanmu
Zhihang Yuan
Xiuhong Li
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models"
3 / 3 papers shown
Title
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
MoE
75
0
0
09 May 2025
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun
Ruikang Liu
Haoli Bai
Han Bao
Kang Zhao
...
Lu Hou
Chun Yuan
Xin Jiang
W. Liu
Jun Yao
MQ
57
3
0
12 Oct 2024
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
365
0
13 Mar 2023
1