Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers

6 October 2020

Qun Liu

Papers citing "Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers"

25 / 25 papers shown

Title
Applications of Knowledge Distillation in Remote Sensing: A SurveyInformation Fusion (Inf. Fusion), 2024 Yassine Himeur N. Aburaed O. Elharrouss Iraklis Varlamis Shadi Atalla Shadi Atalla Hussain Al Ahmad 280 7 0 18 Sep 2024
Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study Aniruddha Roy Pretam Ray Ayush Maheshwari Sudeshna Sarkar Pawan Goyal 244 2 0 09 Jul 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation Heegon Jin Seonil Son Jemin Park Youngseok Kim Hyungjong Noh Yeonsoo Lee 300 3 0 03 Mar 2024
A Comprehensive Survey of Compression Algorithms for Language Models Seungcheol Park Jaehyeon Choi Sojin Lee U. Kang MQ 317 20 0 27 Jan 2024
What is Lost in Knowledge Distillation? Manas Mohanty Tanya Roosta Peyman Passban 91 1 0 07 Nov 2023
A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Takuma Udagawa Aashka Trivedi Michele Merler Bishwaranjan Bhattacharjee 243 8 0 13 Oct 2023
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling Ziming Wang Shumin Han Xiaodi Wang Jing Hao Xianbin Cao Baochang Zhang VLM 232 1 0 18 Sep 2023
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation ObjectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Xinpeng Wang Leonie Weissweiler Hinrich Schütze Barbara Plank 114 11 0 24 May 2023
Towards Understanding and Improving Knowledge Distillation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Songming Zhang Yunlong Liang Shuaibo Wang Wenjuan Han Jian Liu Jinan Xu Jinan Xu 279 14 0 14 May 2023
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models Aashka Trivedi Takuma Udagawa Michele Merler Yikang Shen Yousef El-Kurdi Bishwaranjan Bhattacharjee 259 9 0 16 Mar 2023
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting PerspectiveFindings (Findings), 2023 Jongwoo Ko Seungjoon Park Minchan Jeong S. Hong Euijai Ahn Duhyeuk Chang Se-Young Yun 137 8 0 03 Feb 2023
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource LanguagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Alireza Mohammadshahi Vassilina Nikoulina Alexandre Berard Caroline Brun James Henderson Laurent Besacier VLM MoE LRM 238 24 0 20 Oct 2022
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank AdaptationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022 Mojtaba Valipour Mehdi Rezagholizadeh I. Kobyzev A. Ghodsi 371 236 0 14 Oct 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech ModelsInterspeech (Interspeech), 2022 Takanori Ashihara Takafumi Moriya Kohei Matsuura Tomohiro Tanaka 163 34 0 14 Jul 2022
Do we need Label Regularization to Fine-tune Pre-trained Language Models?Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022 I. Kobyzev A. Jafari Mehdi Rezagholizadeh Tianda Li Alan Do-Omri Peng Lu Pascal Poupart A. Ghodsi 190 3 0 25 May 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge DistillationInternational Conference on Computational Linguistics (COLING), 2022 Md. Akmal Haidar Mehdi Rezagholizadeh Abbas Ghaddar Khalil Bibi Philippe Langlais Pascal Poupart CLL 216 7 0 15 Apr 2022
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher Mehdi Rezagholizadeh A. Jafari Puneeth Salad Pranav Sharma Ali Saheb Pasand A. Ghodsi 226 21 0 16 Oct 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation Md. Akmal Haidar Nithin Anchuri Mehdi Rezagholizadeh Abbas Ghaddar Philippe Langlais Pascal Poupart 304 26 0 21 Sep 2021
Knowledge Distillation with Noisy Labels for Natural Language Understanding Shivendra Bhardwaj Abbas Ghaddar Ahmad Rashid Khalil Bibi Cheng-huan Li A. Ghodsi Philippe Langlais Mehdi Rezagholizadeh 159 2 0 21 Sep 2021
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding Tianda Li Ahmad Rashid A. Jafari Pranav Sharma A. Ghodsi Mehdi Rezagholizadeh AAML 269 5 0 13 Sep 2021
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021 Yuanxin Liu Fandong Meng Zheng Lin Weiping Wang Jie Zhou 74 6 0 10 Jun 2021
Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMaxFindings (Findings), 2021 Ehsan Kamalloo Mehdi Rezagholizadeh Peyman Passban Ali Ghodsi AAML 183 17 0 28 May 2021
Selective Knowledge Distillation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021 Fusheng Wang Jianhao Yan Fandong Meng Jie Zhou 181 66 0 27 May 2021
Towards Zero-Shot Knowledge Distillation for Natural Language ProcessingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 Ahmad Rashid Vasileios Lioutas Abbas Ghaddar Mehdi Rezagholizadeh 242 32 0 31 Dec 2020
ALP-KD: Attention-Based Layer Projection for Knowledge DistillationAAAI Conference on Artificial Intelligence (AAAI), 2020 Peyman Passban Yimeng Wu Mehdi Rezagholizadeh Qun Liu 153 133 0 27 Dec 2020