Annealing Knowledge Distillation

14 April 2021

Papers citing "Annealing Knowledge Distillation"

49 / 49 papers shown

Title
How to Improve the Robustness of Closed-Source Models on NLI Joe Stacey Lisa Alazraki Aran Ubhi Beyza Ermis Aaron Mueller Marek Rei 42 0 0 26 May 2025
Zebra-Llama: Towards Extremely Efficient Hybrid Models Mingyu Yang Mehdi Rezagholizadeh Guihong Li Vikram Appia Emad Barsoum 74 0 0 22 May 2025
Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction Zongyue Qin Shichang Zhang Mingxuan Ju Tong Zhao Neil Shah Yizhou Sun 54 0 0 08 Apr 2025
Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better MingWei Zhou Xiaobing Pei AAML 454 0 0 30 Mar 2025
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity Mutian He Philip N. Garner 178 0 0 09 Oct 2024
Personalized Federated Learning for Generative AI-Assisted Semantic Communications Yubo Peng Feibo Jiang Li Dong Kezhi Wang Kun Yang 87 2 0 03 Oct 2024
Enhancing Romanian Offensive Language Detection through Knowledge Distillation, Multi-Task Learning, and Data Augmentation Vlad-Cristian Matei Iulian-Marius Taiatu Razvan-Alexandru Smadu Dumitru-Clementin Cercel 31 1 0 30 Sep 2024
EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models Hossein Rajabzadeh A. Jafari Aman Sharma Benyamin Jami Hyock Ju Kwon Ali Ghodsi Boxing Chen Mehdi Rezagholizadeh 63 0 0 22 Sep 2024
Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation Lirong Wu Yunfan Liu Haitao Lin Yufei Huang Stan Z. Li 91 1 0 20 Jul 2024
Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data Eun Som Jeon Hongjun Choi A. Shukla Yuan Wang Hyunglae Lee M. Buman Pavan Turaga 62 3 0 07 Jul 2024
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting Shreyan Ganguly Roshan Nayak Rakshith Rao Ujan Deb AP Prathosh 78 1 0 11 May 2024
Dynamic Temperature Knowledge Distillation Yukang Wei Yu Bai 89 5 0 19 Apr 2024
CTSM: Combining Trait and State Emotions for Empathetic Response Model Yufeng Wang Chao Chen Zhou Yang Shuhui Wang Xiangwen Liao 84 6 0 22 Mar 2024
Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information Linfeng Ye Shayan Mohajer Hamidi Renhao Tan En-Hui Yang VLM 78 15 0 16 Jan 2024
An Efficient Detection and Control System for Underwater Docking using Machine Learning and Realistic Simulation: A Comprehensive Approach Jalil Chavez-Galaviz Jianwen Li Matthew Bergman Miras Mengdibayev N. Mahmoudian 69 0 0 02 Nov 2023
A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models Takuma Udagawa Aashka Trivedi Michele Merler Bishwaranjan Bhattacharjee 81 7 0 13 Oct 2023
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model Shicheng Tan Weng Lam Tam Yuanchun Wang Wenwen Gong Yang Yang ... Jiahao Liu Jingang Wang Shuo Zhao Peng Zhang Jie Tang ALM MoE 82 13 0 11 Jun 2023
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method Shicheng Tan Weng Lam Tam Yuanchun Wang Wenwen Gong Shuo Zhao Peng Zhang Jie Tang VLM 49 1 0 11 Jun 2023
Distilling Robustness into Natural Language Inference Models with Domain-Targeted Augmentation Joe Stacey Marek Rei 57 3 0 22 May 2023
Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation Songming Zhang Yunlong Liang Shuaibo Wang Wenjuan Han Jian Liu Jinan Xu Jinan Xu 80 10 0 14 May 2023
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models Aashka Trivedi Takuma Udagawa Michele Merler Yikang Shen Yousef El-Kurdi Bishwaranjan Bhattacharjee 84 7 0 16 Mar 2023
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective Jongwoo Ko Seungjoon Park Minchan Jeong S. Hong Euijai Ahn Duhyeuk Chang Se-Young Yun 67 6 0 03 Feb 2023
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection Chenglong Wang Yi Lu Yongyu Mu Yimin Hu Tong Xiao Jingbo Zhu 97 9 0 01 Feb 2023
On student-teacher deviations in distillation: does it pay to disobey? Vaishnavh Nagarajan A. Menon Srinadh Bhojanapalli H. Mobahi Surinder Kumar 135 10 0 30 Jan 2023
CAMeMBERT: Cascading Assistant-Mediated Multilingual BERT Dan DeGenaro Jugal Kalita 46 0 0 22 Dec 2022
In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models Yukun Huang Yanda Chen Zhou Yu Kathleen McKeown 94 32 0 20 Dec 2022
Swing Distillation: A Privacy-Preserving Knowledge Distillation Framework Junzhuo Li Xinwei Wu Weilong Dong Shuangzhi Wu Chao Bian Deyi Xiong 115 4 0 16 Dec 2022
Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization A. Jafari I. Kobyzev Mehdi Rezagholizadeh Pascal Poupart A. Ghodsi VLM 75 5 0 12 Dec 2022
Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging Peng Lu I. Kobyzev Mehdi Rezagholizadeh Ahmad Rashid A. Ghodsi Philippe Langlais MoMe 104 11 0 12 Dec 2022
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation Mojtaba Valipour Mehdi Rezagholizadeh I. Kobyzev A. Ghodsi 160 185 0 14 Oct 2022
CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation Ibtihel Amara M. Ziaeefard B. Meyer W. Gross J. Clark 38 4 0 15 Sep 2022
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers Zhikai Li Mengjuan Chen Junrui Xiao Qingyi Gu ViT MQ 125 35 0 13 Sep 2022
HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors Luting Wang Xiaojie Li Yue Liao Jiang Jianlong Wu Fei Wang Chao Qian Si Liu 76 20 0 12 Jul 2022
VEM $^2$ L: A Plug-and-play Framework for Fusing Text and Structure Knowledge on Sparse Knowledge Graph Completion Tao He Ming Liu Haichao Zhu Tianwen Jiang Zihao Zheng Jingrun Zhang Sendong Zhao Bing Qin 65 1 0 04 Jul 2022
Do we need Label Regularization to Fine-tune Pre-trained Language Models? I. Kobyzev A. Jafari Mehdi Rezagholizadeh Tianda Li Alan Do-Omri Peng Lu Pascal Poupart A. Ghodsi 77 2 0 25 May 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation Md. Akmal Haidar Mehdi Rezagholizadeh Abbas Ghaddar Khalil Bibi Philippe Langlais Pascal Poupart CLL 79 7 0 15 Apr 2022
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation Shuohuan Wang Yu Sun Yang Xiang Zhihua Wu Siyu Ding ... Tian Wu Wei Zeng Ge Li Wen Gao Haifeng Wang ELM 92 78 0 23 Dec 2021
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher Mehdi Rezagholizadeh A. Jafari Puneeth Salad Pranav Sharma Ali Saheb Pasand A. Ghodsi 143 18 0 16 Oct 2021
A Short Study on Compressing Decoder-Based Language Models Tianda Li Yassir El Mesbahi I. Kobyzev Ahmad Rashid A. Mahmud Nithin Anchuri Habib Hajimolahoseini Yang Liu Mehdi Rezagholizadeh 151 25 0 16 Oct 2021
Kronecker Decomposition for GPT Compression Ali Edalati Marzieh S. Tahaei Ahmad Rashid V. Nia J. Clark Mehdi Rezagholizadeh 87 36 0 15 Oct 2021
Attention-Free Keyword Spotting Mashrur M. Morshed Ahmad Omar Ahsan 114 9 0 14 Oct 2021
Language Modelling via Learning to Rank A. Frydenlund Gagandeep Singh Frank Rudzicz 90 8 0 13 Oct 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation Md. Akmal Haidar Nithin Anchuri Mehdi Rezagholizadeh Abbas Ghaddar Philippe Langlais Pascal Poupart 111 22 0 21 Sep 2021
Knowledge Distillation with Noisy Labels for Natural Language Understanding Shivendra Bhardwaj Abbas Ghaddar Ahmad Rashid Khalil Bibi Cheng-huan Li A. Ghodsi Philippe Langlais Mehdi Rezagholizadeh 58 1 0 21 Sep 2021
iRNN: Integer-only Recurrent Neural Network Eyyub Sari Vanessa Courville V. Nia MQ 85 4 0 20 Sep 2021
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding Tianda Li Ahmad Rashid A. Jafari Pranav Sharma A. Ghodsi Mehdi Rezagholizadeh AAML 122 5 0 13 Sep 2021
Learning to Teach with Student Feedback Yitao Liu Tianxiang Sun Xipeng Qiu Xuanjing Huang VLM 67 6 0 10 Sep 2021
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation Ahmad Rashid Vasileios Lioutas Mehdi Rezagholizadeh AAML 94 37 0 12 May 2021
Towards Zero-Shot Knowledge Distillation for Natural Language Processing Ahmad Rashid Vasileios Lioutas Abbas Ghaddar Mehdi Rezagholizadeh 98 28 0 31 Dec 2020