QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

7 May 2024

Papers citing "QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving"

2 / 52 papers shown

Title
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression Xin Wang Yu Zheng Zhongwei Wan Mi Zhang MQ 53 43 0 12 Mar 2024
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation Seongmin Hong Seungjae Moon Junsoo Kim Sungjae Lee Minsub Kim Dongsoo Lee Joo-Young Kim 61 74 0 22 Sep 2022