Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2010.03034
Cited By
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
6 October 2020
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers"
25 / 25 papers shown
Title
Applications of Knowledge Distillation in Remote Sensing: A Survey
Information Fusion (Inf. Fusion), 2024
Yassine Himeur
N. Aburaed
O. Elharrouss
Iraklis Varlamis
Shadi Atalla
Shadi Atalla
Hussain Al Ahmad
244
7
0
18 Sep 2024
Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study
Aniruddha Roy
Pretam Ray
Ayush Maheshwari
Sudeshna Sarkar
Pawan Goyal
212
2
0
09 Jul 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Heegon Jin
Seonil Son
Jemin Park
Youngseok Kim
Hyungjong Noh
Yeonsoo Lee
276
3
0
03 Mar 2024
A Comprehensive Survey of Compression Algorithms for Language Models
Seungcheol Park
Jaehyeon Choi
Sojin Lee
U. Kang
MQ
247
20
0
27 Jan 2024
What is Lost in Knowledge Distillation?
Manas Mohanty
Tanya Roosta
Peyman Passban
79
1
0
07 Nov 2023
A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Takuma Udagawa
Aashka Trivedi
Michele Merler
Bishwaranjan Bhattacharjee
203
8
0
13 Oct 2023
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling
Ziming Wang
Shumin Han
Xiaodi Wang
Jing Hao
Xianbin Cao
Baochang Zhang
VLM
204
1
0
18 Sep 2023
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Xinpeng Wang
Leonie Weissweiler
Hinrich Schütze
Barbara Plank
98
10
0
24 May 2023
Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Songming Zhang
Yunlong Liang
Shuaibo Wang
Wenjuan Han
Jian Liu
Jinan Xu
Jinan Xu
259
13
0
14 May 2023
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models
Aashka Trivedi
Takuma Udagawa
Michele Merler
Yikang Shen
Yousef El-Kurdi
Bishwaranjan Bhattacharjee
247
9
0
16 Mar 2023
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Findings (Findings), 2023
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
125
8
0
03 Feb 2023
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
VLM
MoE
LRM
198
24
0
20 Oct 2022
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Mojtaba Valipour
Mehdi Rezagholizadeh
I. Kobyzev
A. Ghodsi
347
237
0
14 Oct 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models
Interspeech (Interspeech), 2022
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
163
34
0
14 Jul 2022
Do we need Label Regularization to Fine-tune Pre-trained Language Models?
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
I. Kobyzev
A. Jafari
Mehdi Rezagholizadeh
Tianda Li
Alan Do-Omri
Peng Lu
Pascal Poupart
A. Ghodsi
190
3
0
25 May 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation
International Conference on Computational Linguistics (COLING), 2022
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
188
7
0
15 Apr 2022
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher
Mehdi Rezagholizadeh
A. Jafari
Puneeth Salad
Pranav Sharma
Ali Saheb Pasand
A. Ghodsi
206
20
0
16 Oct 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md. Akmal Haidar
Nithin Anchuri
Mehdi Rezagholizadeh
Abbas Ghaddar
Philippe Langlais
Pascal Poupart
267
26
0
21 Sep 2021
Knowledge Distillation with Noisy Labels for Natural Language Understanding
Shivendra Bhardwaj
Abbas Ghaddar
Ahmad Rashid
Khalil Bibi
Cheng-huan Li
A. Ghodsi
Philippe Langlais
Mehdi Rezagholizadeh
139
2
0
21 Sep 2021
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding
Tianda Li
Ahmad Rashid
A. Jafari
Pranav Sharma
A. Ghodsi
Mehdi Rezagholizadeh
AAML
233
5
0
13 Sep 2021
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Yuanxin Liu
Fandong Meng
Zheng Lin
Weiping Wang
Jie Zhou
74
6
0
10 Jun 2021
Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax
Findings (Findings), 2021
Ehsan Kamalloo
Mehdi Rezagholizadeh
Peyman Passban
Ali Ghodsi
AAML
151
17
0
28 May 2021
Selective Knowledge Distillation for Neural Machine Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Fusheng Wang
Jianhao Yan
Fandong Meng
Jie Zhou
157
66
0
27 May 2021
Towards Zero-Shot Knowledge Distillation for Natural Language Processing
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Ahmad Rashid
Vasileios Lioutas
Abbas Ghaddar
Mehdi Rezagholizadeh
210
31
0
31 Dec 2020
ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
AAAI Conference on Artificial Intelligence (AAAI), 2020
Peyman Passban
Yimeng Wu
Mehdi Rezagholizadeh
Qun Liu
145
132
0
27 Dec 2020
1