ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.07163
  4. Cited By
Annealing Knowledge Distillation

Annealing Knowledge Distillation

14 April 2021
A. Jafari
Mehdi Rezagholizadeh
Pranav Sharma
A. Ghodsi
ArXiv (abs)PDFHTML

Papers citing "Annealing Knowledge Distillation"

49 / 49 papers shown
Title
How to Improve the Robustness of Closed-Source Models on NLI
How to Improve the Robustness of Closed-Source Models on NLI
Joe Stacey
Lisa Alazraki
Aran Ubhi
Beyza Ermis
Aaron Mueller
Marek Rei
42
0
0
26 May 2025
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Mingyu Yang
Mehdi Rezagholizadeh
Guihong Li
Vikram Appia
Emad Barsoum
74
0
0
22 May 2025
Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction
Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction
Zongyue Qin
Shichang Zhang
Mingxuan Ju
Tong Zhao
Neil Shah
Yizhou Sun
54
0
0
08 Apr 2025
Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better
Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better
MingWei Zhou
Xiaobing Pei
AAML
454
0
0
30 Mar 2025
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
178
0
0
09 Oct 2024
Personalized Federated Learning for Generative AI-Assisted Semantic
  Communications
Personalized Federated Learning for Generative AI-Assisted Semantic Communications
Yubo Peng
Feibo Jiang
Li Dong
Kezhi Wang
Kun Yang
87
2
0
03 Oct 2024
Enhancing Romanian Offensive Language Detection through Knowledge
  Distillation, Multi-Task Learning, and Data Augmentation
Enhancing Romanian Offensive Language Detection through Knowledge Distillation, Multi-Task Learning, and Data Augmentation
Vlad-Cristian Matei
Iulian-Marius Taiatu
Razvan-Alexandru Smadu
Dumitru-Clementin Cercel
31
1
0
30 Sep 2024
EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language
  Models
EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
Hossein Rajabzadeh
A. Jafari
Aman Sharma
Benyamin Jami
Hyock Ju Kwon
Ali Ghodsi
Boxing Chen
Mehdi Rezagholizadeh
63
0
0
22 Sep 2024
Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for
  GNN-to-MLP Knowledge Distillation
Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation
Lirong Wu
Yunfan Liu
Haitao Lin
Yufei Huang
Stan Z. Li
91
1
0
20 Jul 2024
Topological Persistence Guided Knowledge Distillation for Wearable
  Sensor Data
Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data
Eun Som Jeon
Hongjun Choi
A. Shukla
Yuan Wang
Hyunglae Lee
M. Buman
Pavan Turaga
62
3
0
07 Jul 2024
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss
  Weighting
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting
Shreyan Ganguly
Roshan Nayak
Rakshith Rao
Ujan Deb
AP Prathosh
78
1
0
11 May 2024
Dynamic Temperature Knowledge Distillation
Dynamic Temperature Knowledge Distillation
Yukang Wei
Yu Bai
89
5
0
19 Apr 2024
CTSM: Combining Trait and State Emotions for Empathetic Response Model
CTSM: Combining Trait and State Emotions for Empathetic Response Model
Yufeng Wang
Chao Chen
Zhou Yang
Shuhui Wang
Xiangwen Liao
84
6
0
22 Mar 2024
Bayes Conditional Distribution Estimation for Knowledge Distillation
  Based on Conditional Mutual Information
Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information
Linfeng Ye
Shayan Mohajer Hamidi
Renhao Tan
En-Hui Yang
VLM
78
15
0
16 Jan 2024
An Efficient Detection and Control System for Underwater Docking using
  Machine Learning and Realistic Simulation: A Comprehensive Approach
An Efficient Detection and Control System for Underwater Docking using Machine Learning and Realistic Simulation: A Comprehensive Approach
Jalil Chavez-Galaviz
Jianwen Li
Matthew Bergman
Miras Mengdibayev
N. Mahmoudian
69
0
0
02 Nov 2023
A Comparative Analysis of Task-Agnostic Distillation Methods for
  Compressing Transformer Language Models
A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models
Takuma Udagawa
Aashka Trivedi
Michele Merler
Bishwaranjan Bhattacharjee
81
7
0
13 Oct 2023
GKD: A General Knowledge Distillation Framework for Large-scale
  Pre-trained Language Model
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Yang Yang
...
Jiahao Liu
Jingang Wang
Shuo Zhao
Peng Zhang
Jie Tang
ALMMoE
82
13
0
11 Jun 2023
Are Intermediate Layers and Labels Really Necessary? A General Language
  Model Distillation Method
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Shuo Zhao
Peng Zhang
Jie Tang
VLM
49
1
0
11 Jun 2023
Distilling Robustness into Natural Language Inference Models with
  Domain-Targeted Augmentation
Distilling Robustness into Natural Language Inference Models with Domain-Targeted Augmentation
Joe Stacey
Marek Rei
57
3
0
22 May 2023
Towards Understanding and Improving Knowledge Distillation for Neural
  Machine Translation
Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation
Songming Zhang
Yunlong Liang
Shuaibo Wang
Wenjuan Han
Jian Liu
Jinan Xu
Jinan Xu
80
10
0
14 May 2023
Neural Architecture Search for Effective Teacher-Student Knowledge
  Transfer in Language Models
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models
Aashka Trivedi
Takuma Udagawa
Michele Merler
Yikang Shen
Yousef El-Kurdi
Bishwaranjan Bhattacharjee
84
7
0
16 Mar 2023
Revisiting Intermediate Layer Distillation for Compressing Language
  Models: An Overfitting Perspective
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
67
6
0
03 Feb 2023
Improved Knowledge Distillation for Pre-trained Language Models via
  Knowledge Selection
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection
Chenglong Wang
Yi Lu
Yongyu Mu
Yimin Hu
Tong Xiao
Jingbo Zhu
97
9
0
01 Feb 2023
On student-teacher deviations in distillation: does it pay to disobey?
On student-teacher deviations in distillation: does it pay to disobey?
Vaishnavh Nagarajan
A. Menon
Srinadh Bhojanapalli
H. Mobahi
Surinder Kumar
135
10
0
30 Jan 2023
CAMeMBERT: Cascading Assistant-Mediated Multilingual BERT
CAMeMBERT: Cascading Assistant-Mediated Multilingual BERT
Dan DeGenaro
Jugal Kalita
46
0
0
22 Dec 2022
In-context Learning Distillation: Transferring Few-shot Learning Ability
  of Pre-trained Language Models
In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
Yukun Huang
Yanda Chen
Zhou Yu
Kathleen McKeown
94
32
0
20 Dec 2022
Swing Distillation: A Privacy-Preserving Knowledge Distillation
  Framework
Swing Distillation: A Privacy-Preserving Knowledge Distillation Framework
Junzhuo Li
Xinwei Wu
Weilong Dong
Shuangzhi Wu
Chao Bian
Deyi Xiong
115
4
0
16 Dec 2022
Continuation KD: Improved Knowledge Distillation through the Lens of
  Continuation Optimization
Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization
A. Jafari
I. Kobyzev
Mehdi Rezagholizadeh
Pascal Poupart
A. Ghodsi
VLM
75
5
0
12 Dec 2022
Improving Generalization of Pre-trained Language Models via Stochastic
  Weight Averaging
Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging
Peng Lu
I. Kobyzev
Mehdi Rezagholizadeh
Ahmad Rashid
A. Ghodsi
Philippe Langlais
MoMe
104
11
0
12 Dec 2022
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic
  Search-Free Low-Rank Adaptation
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
Mojtaba Valipour
Mehdi Rezagholizadeh
I. Kobyzev
A. Ghodsi
160
185
0
14 Oct 2022
CES-KD: Curriculum-based Expert Selection for Guided Knowledge
  Distillation
CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation
Ibtihel Amara
M. Ziaeefard
B. Meyer
W. Gross
J. Clark
38
4
0
15 Sep 2022
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for
  Vision Transformers
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers
Zhikai Li
Mengjuan Chen
Junrui Xiao
Qingyi Gu
ViTMQ
125
35
0
13 Sep 2022
HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors
HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors
Luting Wang
Xiaojie Li
Yue Liao
Jiang
Jianlong Wu
Fei Wang
Chao Qian
Si Liu
76
20
0
12 Jul 2022
VEM$^2$L: A Plug-and-play Framework for Fusing Text and Structure
  Knowledge on Sparse Knowledge Graph Completion
VEM2^22L: A Plug-and-play Framework for Fusing Text and Structure Knowledge on Sparse Knowledge Graph Completion
Tao He
Ming Liu
Haichao Zhu
Tianwen Jiang
Zihao Zheng
Jingrun Zhang
Sendong Zhao
Bing Qin
65
1
0
04 Jul 2022
Do we need Label Regularization to Fine-tune Pre-trained Language
  Models?
Do we need Label Regularization to Fine-tune Pre-trained Language Models?
I. Kobyzev
A. Jafari
Mehdi Rezagholizadeh
Tianda Li
Alan Do-Omri
Peng Lu
Pascal Poupart
A. Ghodsi
77
2
0
25 May 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge
  Distillation
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
79
7
0
15 Apr 2022
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training
  for Language Understanding and Generation
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Shuohuan Wang
Yu Sun
Yang Xiang
Zhihua Wu
Siyu Ding
...
Tian Wu
Wei Zeng
Ge Li
Wen Gao
Haifeng Wang
ELM
92
78
0
23 Dec 2021
Pro-KD: Progressive Distillation by Following the Footsteps of the
  Teacher
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher
Mehdi Rezagholizadeh
A. Jafari
Puneeth Salad
Pranav Sharma
Ali Saheb Pasand
A. Ghodsi
143
18
0
16 Oct 2021
A Short Study on Compressing Decoder-Based Language Models
A Short Study on Compressing Decoder-Based Language Models
Tianda Li
Yassir El Mesbahi
I. Kobyzev
Ahmad Rashid
A. Mahmud
Nithin Anchuri
Habib Hajimolahoseini
Yang Liu
Mehdi Rezagholizadeh
151
25
0
16 Oct 2021
Kronecker Decomposition for GPT Compression
Kronecker Decomposition for GPT Compression
Ali Edalati
Marzieh S. Tahaei
Ahmad Rashid
V. Nia
J. Clark
Mehdi Rezagholizadeh
87
36
0
15 Oct 2021
Attention-Free Keyword Spotting
Attention-Free Keyword Spotting
Mashrur M. Morshed
Ahmad Omar Ahsan
114
9
0
14 Oct 2021
Language Modelling via Learning to Rank
Language Modelling via Learning to Rank
A. Frydenlund
Gagandeep Singh
Frank Rudzicz
90
8
0
13 Oct 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md. Akmal Haidar
Nithin Anchuri
Mehdi Rezagholizadeh
Abbas Ghaddar
Philippe Langlais
Pascal Poupart
111
22
0
21 Sep 2021
Knowledge Distillation with Noisy Labels for Natural Language
  Understanding
Knowledge Distillation with Noisy Labels for Natural Language Understanding
Shivendra Bhardwaj
Abbas Ghaddar
Ahmad Rashid
Khalil Bibi
Cheng-huan Li
A. Ghodsi
Philippe Langlais
Mehdi Rezagholizadeh
58
1
0
21 Sep 2021
iRNN: Integer-only Recurrent Neural Network
iRNN: Integer-only Recurrent Neural Network
Eyyub Sari
Vanessa Courville
V. Nia
MQ
85
4
0
20 Sep 2021
How to Select One Among All? An Extensive Empirical Study Towards the
  Robustness of Knowledge Distillation in Natural Language Understanding
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding
Tianda Li
Ahmad Rashid
A. Jafari
Pranav Sharma
A. Ghodsi
Mehdi Rezagholizadeh
AAML
122
5
0
13 Sep 2021
Learning to Teach with Student Feedback
Learning to Teach with Student Feedback
Yitao Liu
Tianxiang Sun
Xipeng Qiu
Xuanjing Huang
VLM
67
6
0
10 Sep 2021
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation
Ahmad Rashid
Vasileios Lioutas
Mehdi Rezagholizadeh
AAML
94
37
0
12 May 2021
Towards Zero-Shot Knowledge Distillation for Natural Language Processing
Towards Zero-Shot Knowledge Distillation for Natural Language Processing
Ahmad Rashid
Vasileios Lioutas
Abbas Ghaddar
Mehdi Rezagholizadeh
98
28
0
31 Dec 2020
1