ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.03034
  4. Cited By
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique
  for Intermediate Layers

Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers

6 October 2020
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
    MoE
ArXiv (abs)PDFHTML

Papers citing "Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers"

25 / 25 papers shown
Title
Applications of Knowledge Distillation in Remote Sensing: A Survey
Applications of Knowledge Distillation in Remote Sensing: A SurveyInformation Fusion (Inf. Fusion), 2024
Yassine Himeur
N. Aburaed
O. Elharrouss
Iraklis Varlamis
Shadi Atalla
Shadi Atalla
Hussain Al Ahmad
280
7
0
18 Sep 2024
Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge
  Distillation: A Case Study
Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study
Aniruddha Roy
Pretam Ray
Ayush Maheshwari
Sudeshna Sarkar
Pawan Goyal
244
2
0
09 Jul 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge
  Distillation in Neural Machine Translation
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Heegon Jin
Seonil Son
Jemin Park
Youngseok Kim
Hyungjong Noh
Yeonsoo Lee
300
3
0
03 Mar 2024
A Comprehensive Survey of Compression Algorithms for Language Models
A Comprehensive Survey of Compression Algorithms for Language Models
Seungcheol Park
Jaehyeon Choi
Sojin Lee
U. Kang
MQ
317
20
0
27 Jan 2024
What is Lost in Knowledge Distillation?
What is Lost in Knowledge Distillation?
Manas Mohanty
Tanya Roosta
Peyman Passban
91
1
0
07 Nov 2023
A Comparative Analysis of Task-Agnostic Distillation Methods for
  Compressing Transformer Language Models
A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Takuma Udagawa
Aashka Trivedi
Michele Merler
Bishwaranjan Bhattacharjee
243
8
0
13 Oct 2023
Heterogeneous Generative Knowledge Distillation with Masked Image
  Modeling
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling
Ziming Wang
Shumin Han
Xiaodi Wang
Jing Hao
Xianbin Cao
Baochang Zhang
VLM
232
1
0
18 Sep 2023
How to Distill your BERT: An Empirical Study on the Impact of Weight
  Initialisation and Distillation Objectives
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation ObjectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Xinpeng Wang
Leonie Weissweiler
Hinrich Schütze
Barbara Plank
114
11
0
24 May 2023
Towards Understanding and Improving Knowledge Distillation for Neural
  Machine Translation
Towards Understanding and Improving Knowledge Distillation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Songming Zhang
Yunlong Liang
Shuaibo Wang
Wenjuan Han
Jian Liu
Jinan Xu
Jinan Xu
279
14
0
14 May 2023
Neural Architecture Search for Effective Teacher-Student Knowledge
  Transfer in Language Models
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models
Aashka Trivedi
Takuma Udagawa
Michele Merler
Yikang Shen
Yousef El-Kurdi
Bishwaranjan Bhattacharjee
259
9
0
16 Mar 2023
Revisiting Intermediate Layer Distillation for Compressing Language
  Models: An Overfitting Perspective
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting PerspectiveFindings (Findings), 2023
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
137
8
0
03 Feb 2023
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model
  for Low-Resource Languages
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource LanguagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
VLMMoELRM
238
24
0
20 Oct 2022
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic
  Search-Free Low-Rank Adaptation
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank AdaptationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Mojtaba Valipour
Mehdi Rezagholizadeh
I. Kobyzev
A. Ghodsi
371
236
0
14 Oct 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic
  Knowledge Distillation of Self-Supervised Speech Models
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech ModelsInterspeech (Interspeech), 2022
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
163
34
0
14 Jul 2022
Do we need Label Regularization to Fine-tune Pre-trained Language
  Models?
Do we need Label Regularization to Fine-tune Pre-trained Language Models?Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
I. Kobyzev
A. Jafari
Mehdi Rezagholizadeh
Tianda Li
Alan Do-Omri
Peng Lu
Pascal Poupart
A. Ghodsi
190
3
0
25 May 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge
  Distillation
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge DistillationInternational Conference on Computational Linguistics (COLING), 2022
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
216
7
0
15 Apr 2022
Pro-KD: Progressive Distillation by Following the Footsteps of the
  Teacher
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher
Mehdi Rezagholizadeh
A. Jafari
Puneeth Salad
Pranav Sharma
Ali Saheb Pasand
A. Ghodsi
226
21
0
16 Oct 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md. Akmal Haidar
Nithin Anchuri
Mehdi Rezagholizadeh
Abbas Ghaddar
Philippe Langlais
Pascal Poupart
304
26
0
21 Sep 2021
Knowledge Distillation with Noisy Labels for Natural Language
  Understanding
Knowledge Distillation with Noisy Labels for Natural Language Understanding
Shivendra Bhardwaj
Abbas Ghaddar
Ahmad Rashid
Khalil Bibi
Cheng-huan Li
A. Ghodsi
Philippe Langlais
Mehdi Rezagholizadeh
159
2
0
21 Sep 2021
How to Select One Among All? An Extensive Empirical Study Towards the
  Robustness of Knowledge Distillation in Natural Language Understanding
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding
Tianda Li
Ahmad Rashid
A. Jafari
Pranav Sharma
A. Ghodsi
Mehdi Rezagholizadeh
AAML
269
5
0
13 Sep 2021
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT
  Knowledge Distillation
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Yuanxin Liu
Fandong Meng
Zheng Lin
Weiping Wang
Jie Zhou
74
6
0
10 Jun 2021
Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data
  Augmentation via MiniMax
Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMaxFindings (Findings), 2021
Ehsan Kamalloo
Mehdi Rezagholizadeh
Peyman Passban
Ali Ghodsi
AAML
183
17
0
28 May 2021
Selective Knowledge Distillation for Neural Machine Translation
Selective Knowledge Distillation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Fusheng Wang
Jianhao Yan
Fandong Meng
Jie Zhou
181
66
0
27 May 2021
Towards Zero-Shot Knowledge Distillation for Natural Language Processing
Towards Zero-Shot Knowledge Distillation for Natural Language ProcessingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Ahmad Rashid
Vasileios Lioutas
Abbas Ghaddar
Mehdi Rezagholizadeh
242
32
0
31 Dec 2020
ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
ALP-KD: Attention-Based Layer Projection for Knowledge DistillationAAAI Conference on Artificial Intelligence (AAAI), 2020
Peyman Passban
Yimeng Wu
Mehdi Rezagholizadeh
Qun Liu
153
133
0
27 Dec 2020
1