LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models
Knowledge distillation (KD) has been a predominant method for compressing Large Language Models (LLMs). In this paper, we first revisit KD and Low-Rank Adaption (LoRA) and demonstrate that they follow the same paradigm. Inspired by this observation, we propose a parameter-efficient knowledge distillation method, LLM-NEO, which integrates LoRA into KD to improve the efficiency of knowledge transfer. After that, we summarize some valuable guidelines for the hyperparameters in LLM-NEO. Experimental results on compressing Llama 2 and Llama 3.2 show that LLM-NEO outperforms various baselines. Further analysis demonstrates the robustness of the proposed LLM-NEO on variants of LoRA. The code and trained models are available at [Github](this https URL).
View on arXiv