EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

28 October 2024

Shih-yang Liu

Nai Chit Fung

Abstract

In this work, we re-formulate the model compression problem into the customized compensation problem: Given a compressed model, we aim to introduce residual low-rank paths to compensate for compression errors under customized requirements from users (e.g., tasks, compression ratios), resulting in greater flexibility in balancing accuracy and overhead(inference and model size) without being bound to fixed compression formats. However, naively applying SVD to derive residual paths causes suboptimal utilization of the low-rank representation capacity. Instead, we propose Training-free Eigenspace Low-Rank Approximation (EoRA), a method that directly minimizes compression-induced errors without requiring gradient-based training, achieving fast optimization in minutes using a small amount of calibration data. EoRA projects compression errors into the eigenspace of input activations, leveraging eigenvalues to effectively prioritize the reconstruction of high-importance error components. Moreover, EoRA can be seamlessly integrated with fine-tuning and quantization to further improve effectiveness and efficiency. EoRA consistently outperforms previous methods in compensating errors for compressed LLaMA2/3 models on various tasks, such as language generation, commonsense reasoning, and math reasoning tasks (e.g., 31.31%/12.88% and 9.69% improvements on ARC-Easy/ARC-Challenge and MathQA when compensating LLaMA3-8B that is quantized to 4-bit and pruned to 2:4 sparsity). EoRA offers a scalable, training-free solution to compensate for compression errors, making it a powerful tool to deploy LLMs more flexibly. Code is available atthis https URL.

View on arXiv

@article{liu2025_2410.21271,
  title={ EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation },
  author={ Shih-Yang Liu and Maksim Khadkevich and Nai Chit Fung and Charbel Sakr and Chao-Han Huck Yang and Chien-Yi Wang and Saurav Muralidharan and Hongxu Yin and Kwang-Ting Cheng and Jan Kautz and Yu-Chiang Frank Wang and Pavlo Molchanov and Min-Hung Chen },
  journal={arXiv preprint arXiv:2410.21271},
  year={ 2025 }
}

Comments on this paper