MiniPLM: Knowledge Distillation for Pre-Training Language Models

22 October 2024

Papers citing "MiniPLM: Knowledge Distillation for Pre-Training Language Models"

4 / 4 papers shown

Title
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $α$ - $β$ -Divergence Guanghui Wang Zhiyong Yang Z. Wang Shi Wang Qianqian Xu Q. Huang 19 0 0 07 May 2025
An overview of model uncertainty and variability in LLM-based sentiment analysis. Challenges, mitigation strategies and the role of explainability David Herrera-Poyatos Carlos Peláez-González Cristina Zuheros Andrés Herrera-Poyatos Virilo Tejedor F. Herrera Rosana Montes 18 1 0 06 Apr 2025
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling Haebin Shin Lei Ji Xiao Liu Yeyun Gong 44 0 0 24 Mar 2025
Mixture of Attentions For Speculative Decoding Matthieu Zimmer Milan Gritta Gerasimos Lampouras Haitham Bou Ammar Jun Wang 55 4 0 04 Oct 2024