Beware of Calibration Data for Pruning Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024 |
Eigen Attention: Attention in Low-Rank Space for KV Cache CompressionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |