BiSup: Bidirectional Quantization Error Suppression for Large Language Models

24 May 2024

Papers citing "BiSup: Bidirectional Quantization Error Suppression for Large Language Models"

2 / 2 papers shown

Title
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization J. Yang Byeongwook Kim Jeongin Bae Beomseok Kwon Gunho Park Eunho Yang S. Kwon Dongsoo Lee MQ 34 12 0 28 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey Wenxiao Wang Wei Chen Yicong Luo Yongliu Long Zhengkai Lin Liye Zhang Binbin Lin Deng Cai Xiaofei He MQ 36 30 0 15 Feb 2024