ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.03532
  4. Cited By
Understanding and Improving Knowledge Distillation
v1v2 (latest)

Understanding and Improving Knowledge Distillation

10 February 2020
Jiaxi Tang
Rakesh Shivanna
Zhe Zhao
Dong Lin
Anima Singh
Ed H. Chi
Sagar Jain
ArXiv (abs)PDFHTML

Papers citing "Understanding and Improving Knowledge Distillation"

50 / 85 papers shown
When Data Falls Short: Grokking Below the Critical Threshold
When Data Falls Short: Grokking Below the Critical Threshold
Vaibhav Singh
Eugene Belilovsky
Rahaf Aljundi
134
0
0
06 Nov 2025
Revisiting Knowledge Distillation: The Hidden Role of Dataset Size
Revisiting Knowledge Distillation: The Hidden Role of Dataset Size
Giulia Lanzillotta
Felix Sarnthein
Gil Kur
Thomas Hofmann
Bobby He
163
0
0
17 Oct 2025
Information-Theoretic Criteria for Knowledge Distillation in Multimodal Learning
Information-Theoretic Criteria for Knowledge Distillation in Multimodal Learning
Rongrong Xie
Yizhou Xu
Guido Sanguinetti
160
0
0
15 Oct 2025
iCD: A Implicit Clustering Distillation Mathod for Structural Information Mining
iCD: A Implicit Clustering Distillation Mathod for Structural Information Mining
Xiang Xue
Yatu Ji
Qing-dao-er-ji Ren
Bao Shi
Min Lu
Nier Wu
Xufei Zhuang
Haiteng Xu
Gan-qi-qi-ge Cha
172
0
0
16 Sep 2025
Scaling and Distilling Transformer Models for sEMG
Scaling and Distilling Transformer Models for sEMG
Nicholas Mehlman
Jean-Christophe Gagnon-Audet
Michael Shvartsman
Kelvin Niu
Alexander H. Miller
Shagun Sodhani
MedIm
221
1
0
29 Jul 2025
Learning Critically: Selective Self Distillation in Federated Learning on Non-IID Data
Learning Critically: Selective Self Distillation in Federated Learning on Non-IID DataIEEE Transactions on Big Data (IEEE Trans. Big Data), 2024
Yuting He
Yiqiang Chen
Xiaodong Yang
H. Yu
Yi-Hua Huang
Yang Gu
FedML
496
31
0
20 Apr 2025
Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models
Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models
Junjie Yang
Junhao Song
Xudong Han
Ziqian Bi
Pohsun Feng
...
Yujiao Shi
Qian Niu
Cheng Fei
Keyu Chen
Ming Liu
VLM
428
4
0
18 Apr 2025
A Transformer-in-Transformer Network Utilizing Knowledge Distillation for Image Recognition
A Transformer-in-Transformer Network Utilizing Knowledge Distillation for Image Recognition
Dewan Tauhid Rahman
Yeahia Sarker
Antar Mazumder
Md. Shamim Anower
ViT
207
0
0
24 Feb 2025
CURing Large Models: Compression via CUR Decomposition
CURing Large Models: Compression via CUR Decomposition
Sanghyeon Park
Soo-Mook Moon
402
2
0
08 Jan 2025
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
467
1
0
01 Nov 2024
Knowledge Distillation in Federated Learning: a Survey on Long Lasting
  Challenges and New Solutions
Knowledge Distillation in Federated Learning: a Survey on Long Lasting Challenges and New Solutions
Laiqiao Qin
Tianqing Zhu
Wanlei Zhou
Philip S. Yu
269
24
0
16 Jun 2024
Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch
Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch
Wen-Shu Fan
Xin-Chun Li
Bowen Tao
414
5
0
21 May 2024
Control Policy Correction Framework for Reinforcement Learning-based
  Energy Arbitrage Strategies
Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies
Seyed soroush Karimi madahi
Gargya Gokhale
Marie-Sophie Verwee
Bert Claessens
Chris Develder
341
7
0
29 Apr 2024
Retrieval and Distill: A Temporal Data Shift-Free Paradigm for Online Recommendation System
Retrieval and Distill: A Temporal Data Shift-Free Paradigm for Online Recommendation System
Lei Zheng
Ning Li
Weinan Zhang
Yong Yu
AI4TS
398
0
0
24 Apr 2024
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
Wencheng Zhu
Xin Zhou
Q. Hu
Yu Wang
Qinghua Hu
VLM
428
9
0
22 Apr 2024
Dynamic Temperature Knowledge Distillation
Dynamic Temperature Knowledge Distillation
Yukang Wei
Yu Bai
327
12
0
19 Apr 2024
Distilling Adversarial Robustness Using Heterogeneous Teachers
Distilling Adversarial Robustness Using Heterogeneous Teachers
Jieren Deng
A. Palmer
Rigel Mahmood
Ethan Rathbun
Jinbo Bi
Kaleel Mahmood
Derek Aguiar
AAML
277
4
0
23 Feb 2024
Bayes Conditional Distribution Estimation for Knowledge Distillation
  Based on Conditional Mutual Information
Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual InformationInternational Conference on Learning Representations (ICLR), 2024
Linfeng Ye
Shayan Mohajer Hamidi
Renhao Tan
En-Hui Yang
VLM
368
27
0
16 Jan 2024
Unraveling Key Factors of Knowledge Distillation
Jingxuan Wei
Linzhuang Sun
Xu Tan
Bihui Yu
Ruifeng Guo
228
0
0
14 Dec 2023
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
Towards the Fundamental Limits of Knowledge Transfer over Finite DomainsInternational Conference on Learning Representations (ICLR), 2023
Qingyue Zhao
Banghua Zhu
528
5
0
11 Oct 2023
Can pre-trained models assist in dataset distillation?
Can pre-trained models assist in dataset distillation?
Yao Lu
Xuguang Chen
Yuchen Zhang
Jianyang Gu
Tianle Zhang
Yifan Zhang
Xiaoniu Yang
Qi Xuan
Kai Wang
Yang You
DD
306
14
0
05 Oct 2023
Data Upcycling Knowledge Distillation for Image Super-Resolution
Data Upcycling Knowledge Distillation for Image Super-Resolution
Yun-feng Zhang
Wei Li
Simiao Li
Hanting Chen
Zhaopeng Tu
Wenjun Wang
Bingyi Jing
Hai-lin Wang
Jie Hu
407
8
0
25 Sep 2023
Heterogeneous Generative Knowledge Distillation with Masked Image
  Modeling
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling
Ziming Wang
Shumin Han
Xiaodi Wang
Jing Hao
Xianbin Cao
Baochang Zhang
VLM
333
1
0
18 Sep 2023
Interpretability-Aware Vision Transformer
Interpretability-Aware Vision Transformer
Yao Qiang
Chengyin Li
Prashant Khanduri
D. Zhu
ViT
706
13
0
14 Sep 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
354
38
0
27 Aug 2023
Towards Better Query Classification with Multi-Expert Knowledge
  Condensation in JD Ads Search
Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search
Hai-Jian Ke
Ming Pang
Zheng Fang
Xue Jiang
Xi-Wei Zhao
Changping Peng
Zhangang Lin
Jinghe Hu
Jingping Shao
449
0
0
02 Aug 2023
DOT: A Distillation-Oriented Trainer
DOT: A Distillation-Oriented TrainerIEEE International Conference on Computer Vision (ICCV), 2023
Borui Zhao
Quan Cui
Renjie Song
Jiajun Liang
239
14
0
17 Jul 2023
On the Impact of Knowledge Distillation for Model Interpretability
On the Impact of Knowledge Distillation for Model InterpretabilityInternational Conference on Machine Learning (ICML), 2023
Hyeongrok Han
Siwon Kim
Hyun-Soo Choi
Sungroh Yoon
325
12
0
25 May 2023
Towards Understanding and Improving Knowledge Distillation for Neural
  Machine Translation
Towards Understanding and Improving Knowledge Distillation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Songming Zhang
Yunlong Liang
Shuaibo Wang
Wenjuan Han
Jian Liu
Jinan Xu
Jinan Xu
403
17
0
14 May 2023
Learning From Biased Soft Labels
Learning From Biased Soft LabelsNeural Information Processing Systems (NeurIPS), 2023
Hua Yuan
Ning Xu
Yuge Shi
Xin Geng
Yong Rui
FedML
233
12
0
16 Feb 2023
Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels
Jaccard Metric Losses: Optimizing the Jaccard Index with Soft LabelsNeural Information Processing Systems (NeurIPS), 2023
Zifu Wang
Xuefei Ning
Matthew B. Blaschko
VLM
570
30
0
11 Feb 2023
On student-teacher deviations in distillation: does it pay to disobey?
On student-teacher deviations in distillation: does it pay to disobey?Neural Information Processing Systems (NeurIPS), 2023
Vaishnavh Nagarajan
A. Menon
Srinadh Bhojanapalli
H. Mobahi
Surinder Kumar
447
20
0
30 Jan 2023
Supervision Complexity and its Role in Knowledge Distillation
Supervision Complexity and its Role in Knowledge DistillationInternational Conference on Learning Representations (ICLR), 2023
Hrayr Harutyunyan
A. S. Rawat
A. Menon
Seungyeon Kim
Surinder Kumar
357
19
0
28 Jan 2023
LEAD: Liberal Feature-based Distillation for Dense Retrieval
LEAD: Liberal Feature-based Distillation for Dense RetrievalWeb Search and Data Mining (WSDM), 2022
Hao Sun
Xiao Liu
Yeyun Gong
Anlei Dong
Jing Lu
Yan Zhang
Linjun Yang
Rangan Majumder
Nan Duan
325
2
0
10 Dec 2022
Understanding the Role of Mixup in Knowledge Distillation: An Empirical
  Study
Understanding the Role of Mixup in Knowledge Distillation: An Empirical StudyIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Hongjun Choi
Eunyeong Jeon
Ankita Shukla
Pavan Turaga
248
10
0
08 Nov 2022
Hard Gate Knowledge Distillation -- Leverage Calibration for Robust and
  Reliable Language Model
Hard Gate Knowledge Distillation -- Leverage Calibration for Robust and Reliable Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Dongkyu Lee
Zhiliang Tian
Ying Zhao
Ka Chun Cheung
Ningyu Zhang
200
5
0
22 Oct 2022
Adaptive Label Smoothing with Self-Knowledge in Natural Language
  Generation
Adaptive Label Smoothing with Self-Knowledge in Natural Language GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Dongkyu Lee
Ka Chun Cheung
Ningyu Zhang
228
10
0
22 Oct 2022
Meta-Learning with Self-Improving Momentum Target
Meta-Learning with Self-Improving Momentum TargetNeural Information Processing Systems (NeurIPS), 2022
Jihoon Tack
Jongjin Park
Hankook Lee
Jaeho Lee
Jinwoo Shin
LRM
333
17
0
11 Oct 2022
Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again
Asymmetric Temperature Scaling Makes Larger Networks Teach Well AgainNeural Information Processing Systems (NeurIPS), 2022
Xin-Chun Li
Wenxuan Fan
Shaoming Song
Yinchuan Li
Bingshuai Li
Yunfeng Shao
De-Chuan Zhan
344
37
0
10 Oct 2022
Using Knowledge Distillation to improve interpretable models in a retail
  banking context
Using Knowledge Distillation to improve interpretable models in a retail banking context
Maxime Biehler
Mohamed Guermazi
Célim Starck
282
2
0
30 Sep 2022
Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for
  Classification
Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for ClassificationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Quanshi Zhang
Feng He
Yilan Chen
Zhefan Rao
221
47
0
18 Aug 2022
Label Semantic Knowledge Distillation for Unbiased Scene Graph
  Generation
Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation
Lin Li
Long Chen
Hanrong Shi
Wenxiao Wang
Jian Shao
Yi Yang
Jun Xiao
VLM
289
30
0
07 Aug 2022
TinyViT: Fast Pretraining Distillation for Small Vision Transformers
TinyViT: Fast Pretraining Distillation for Small Vision TransformersEuropean Conference on Computer Vision (ECCV), 2022
Kan Wu
Jinnian Zhang
Houwen Peng
Xiyang Dai
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
352
456
0
21 Jul 2022
Revisiting Label Smoothing and Knowledge Distillation Compatibility:
  What was Missing?
Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing?International Conference on Machine Learning (ICML), 2022
Keshigeyan Chandrasegaran
Ngoc-Trung Tran
Yunqing Zhao
Ngai-Man Cheung
314
52
0
29 Jun 2022
Toward Student-Oriented Teacher Network Training For Knowledge
  Distillation
Toward Student-Oriented Teacher Network Training For Knowledge DistillationInternational Conference on Learning Representations (ICLR), 2022
Chengyu Dong
Liyuan Liu
Jingbo Shang
328
10
0
14 Jun 2022
The Modality Focusing Hypothesis: Towards Understanding Crossmodal
  Knowledge Distillation
The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge DistillationInternational Conference on Learning Representations (ICLR), 2022
Zihui Xue
Zhengqi Gao
Sucheng Ren
Hang Zhao
441
61
0
13 Jun 2022
Selective Cross-Task Distillation
Selective Cross-Task Distillation
Su Lu
Han-Jia Ye
De-Chuan Zhan
292
2
0
25 Apr 2022
Localization Distillation for Object Detection
Localization Distillation for Object DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Zhaohui Zheng
Rongguang Ye
Ping Wang
Dongwei Ren
Jun Wang
W. Zuo
Ming-Ming Cheng
280
85
0
12 Apr 2022
Better Supervisory Signals by Observing Learning Paths
Better Supervisory Signals by Observing Learning PathsInternational Conference on Learning Representations (ICLR), 2022
Yi Ren
Shangmin Guo
Danica J. Sutherland
170
25
0
04 Mar 2022
Bridging the Gap Between Patient-specific and Patient-independent
  Seizure Prediction via Knowledge Distillation
Bridging the Gap Between Patient-specific and Patient-independent Seizure Prediction via Knowledge DistillationJournal of Neural Engineering (J. Neural Eng.), 2022
Di Wu
Jie Yang
Mohamad Sawan
FedML
220
29
0
25 Feb 2022
12
Next
Page 1 of 2