Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.04532
Cited By
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
7 May 2024
Yujun Lin
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
Re-assign community
ArXiv
PDF
HTML
Papers citing
"QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving"
2 / 52 papers shown
Title
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Xin Wang
Yu Zheng
Zhongwei Wan
Mi Zhang
MQ
53
43
0
12 Mar 2024
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Seongmin Hong
Seungjae Moon
Junsoo Kim
Sungjae Lee
Minsub Kim
Dongsoo Lee
Joo-Young Kim
61
74
0
22 Sep 2022
Previous
1
2