LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models

11 November 2024

Abstract

Knowledge distillation (KD) has been a predominant method for compressing Large Language Models (LLMs). In this paper, we first revisit KD and Low-Rank Adaption (LoRA) and demonstrate that they follow the same paradigm. Inspired by this observation, we propose a parameter-efficient knowledge distillation method, LLM-NEO, which integrates LoRA into KD to improve the efficiency of knowledge transfer. After that, we summarize some valuable guidelines for the hyperparameters in LLM-NEO. Experimental results on compressing Llama 2 and Llama 3.2 show that LLM-NEO outperforms various baselines. Further analysis demonstrates the robustness of the proposed LLM-NEO on variants of LoRA. The code and trained models are available at [Github](this https URL).

View on arXiv

@article{yang2025_2411.06839,
  title={ LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models },
  author={ Runming Yang and Taiqiang Wu and Jiahao Wang and Pengfei Hu and Yik-Chung Wu and Ngai Wong and Yujiu Yang },
  journal={arXiv preprint arXiv:2411.06839},
  year={ 2025 }
}

Comments on this paper