LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |