Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation

3 March 2024

Papers citing "Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation"

4 / 4 papers shown

Title
Scalable Model Merging with Progressive Layer-wise Distillation Jing Xu Jiazheng Li J. Zhang MoMe FedML 90 0 0 18 Feb 2025
AI Safety in Generative AI Large Language Models: A Survey Jaymari Chua Yun Yvonna Li Shiyi Yang Chen Wang Lina Yao LM&MA 36 12 0 06 Jul 2024
Distilling Linguistic Context for Language Model Compression Geondo Park Gyeongman Kim Eunho Yang 45 37 0 17 Sep 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,956 0 20 Apr 2018