ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.03034
  4. Cited By
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique
  for Intermediate Layers

Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers

6 October 2020
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
    MoE
ArXiv (abs)PDFHTML

Papers citing "Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers"

25 / 25 papers shown
Title
Applications of Knowledge Distillation in Remote Sensing: A Survey
Applications of Knowledge Distillation in Remote Sensing: A SurveyInformation Fusion (Inf. Fusion), 2024
Yassine Himeur
N. Aburaed
O. Elharrouss
Iraklis Varlamis
Shadi Atalla
Shadi Atalla
Hussain Al Ahmad
244
7
0
18 Sep 2024
Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge
  Distillation: A Case Study
Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study
Aniruddha Roy
Pretam Ray
Ayush Maheshwari
Sudeshna Sarkar
Pawan Goyal
212
2
0
09 Jul 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge
  Distillation in Neural Machine Translation
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Heegon Jin
Seonil Son
Jemin Park
Youngseok Kim
Hyungjong Noh
Yeonsoo Lee
276
3
0
03 Mar 2024
A Comprehensive Survey of Compression Algorithms for Language Models
A Comprehensive Survey of Compression Algorithms for Language Models
Seungcheol Park
Jaehyeon Choi
Sojin Lee
U. Kang
MQ
247
20
0
27 Jan 2024
What is Lost in Knowledge Distillation?
What is Lost in Knowledge Distillation?
Manas Mohanty
Tanya Roosta
Peyman Passban
79
1
0
07 Nov 2023
A Comparative Analysis of Task-Agnostic Distillation Methods for
  Compressing Transformer Language Models
A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Takuma Udagawa
Aashka Trivedi
Michele Merler
Bishwaranjan Bhattacharjee
203
8
0
13 Oct 2023
Heterogeneous Generative Knowledge Distillation with Masked Image
  Modeling
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling
Ziming Wang
Shumin Han
Xiaodi Wang
Jing Hao
Xianbin Cao
Baochang Zhang
VLM
204
1
0
18 Sep 2023
How to Distill your BERT: An Empirical Study on the Impact of Weight
  Initialisation and Distillation Objectives
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation ObjectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Xinpeng Wang
Leonie Weissweiler
Hinrich Schütze
Barbara Plank
98
10
0
24 May 2023
Towards Understanding and Improving Knowledge Distillation for Neural
  Machine Translation
Towards Understanding and Improving Knowledge Distillation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Songming Zhang
Yunlong Liang
Shuaibo Wang
Wenjuan Han
Jian Liu
Jinan Xu
Jinan Xu
259
13
0
14 May 2023
Neural Architecture Search for Effective Teacher-Student Knowledge
  Transfer in Language Models
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models
Aashka Trivedi
Takuma Udagawa
Michele Merler
Yikang Shen
Yousef El-Kurdi
Bishwaranjan Bhattacharjee
247
9
0
16 Mar 2023
Revisiting Intermediate Layer Distillation for Compressing Language
  Models: An Overfitting Perspective
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting PerspectiveFindings (Findings), 2023
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
125
8
0
03 Feb 2023
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model
  for Low-Resource Languages
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource LanguagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
VLMMoELRM
198
24
0
20 Oct 2022
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic
  Search-Free Low-Rank Adaptation
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank AdaptationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Mojtaba Valipour
Mehdi Rezagholizadeh
I. Kobyzev
A. Ghodsi
347
237
0
14 Oct 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic
  Knowledge Distillation of Self-Supervised Speech Models
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech ModelsInterspeech (Interspeech), 2022
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
163
34
0
14 Jul 2022
Do we need Label Regularization to Fine-tune Pre-trained Language
  Models?
Do we need Label Regularization to Fine-tune Pre-trained Language Models?Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
I. Kobyzev
A. Jafari
Mehdi Rezagholizadeh
Tianda Li
Alan Do-Omri
Peng Lu
Pascal Poupart
A. Ghodsi
190
3
0
25 May 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge
  Distillation
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge DistillationInternational Conference on Computational Linguistics (COLING), 2022
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
188
7
0
15 Apr 2022
Pro-KD: Progressive Distillation by Following the Footsteps of the
  Teacher
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher
Mehdi Rezagholizadeh
A. Jafari
Puneeth Salad
Pranav Sharma
Ali Saheb Pasand
A. Ghodsi
206
20
0
16 Oct 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md. Akmal Haidar
Nithin Anchuri
Mehdi Rezagholizadeh
Abbas Ghaddar
Philippe Langlais
Pascal Poupart
267
26
0
21 Sep 2021
Knowledge Distillation with Noisy Labels for Natural Language
  Understanding
Knowledge Distillation with Noisy Labels for Natural Language Understanding
Shivendra Bhardwaj
Abbas Ghaddar
Ahmad Rashid
Khalil Bibi
Cheng-huan Li
A. Ghodsi
Philippe Langlais
Mehdi Rezagholizadeh
139
2
0
21 Sep 2021
How to Select One Among All? An Extensive Empirical Study Towards the
  Robustness of Knowledge Distillation in Natural Language Understanding
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding
Tianda Li
Ahmad Rashid
A. Jafari
Pranav Sharma
A. Ghodsi
Mehdi Rezagholizadeh
AAML
233
5
0
13 Sep 2021
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT
  Knowledge Distillation
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Yuanxin Liu
Fandong Meng
Zheng Lin
Weiping Wang
Jie Zhou
74
6
0
10 Jun 2021
Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data
  Augmentation via MiniMax
Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMaxFindings (Findings), 2021
Ehsan Kamalloo
Mehdi Rezagholizadeh
Peyman Passban
Ali Ghodsi
AAML
151
17
0
28 May 2021
Selective Knowledge Distillation for Neural Machine Translation
Selective Knowledge Distillation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Fusheng Wang
Jianhao Yan
Fandong Meng
Jie Zhou
157
66
0
27 May 2021
Towards Zero-Shot Knowledge Distillation for Natural Language Processing
Towards Zero-Shot Knowledge Distillation for Natural Language ProcessingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Ahmad Rashid
Vasileios Lioutas
Abbas Ghaddar
Mehdi Rezagholizadeh
210
31
0
31 Dec 2020
ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
ALP-KD: Attention-Based Layer Projection for Knowledge DistillationAAAI Conference on Artificial Intelligence (AAAI), 2020
Peyman Passban
Yimeng Wu
Mehdi Rezagholizadeh
Qun Liu
145
132
0
27 Dec 2020
1