v1v2 (latest)

Understanding and Improving Knowledge Distillation

10 February 2020

Papers citing "Understanding and Improving Knowledge Distillation"

50 / 85 papers shown

When Data Falls Short: Grokking Below the Critical Threshold

Vaibhav Singh

Eugene Belilovsky

Rahaf Aljundi

134

06 Nov 2025

Revisiting Knowledge Distillation: The Hidden Role of Dataset Size

163

17 Oct 2025

Information-Theoretic Criteria for Knowledge Distillation in Multimodal Learning

Rongrong Xie

Yizhou Xu

Guido Sanguinetti

160

15 Oct 2025

iCD: A Implicit Clustering Distillation Mathod for Structural Information Mining

172

16 Sep 2025

Scaling and Distilling Transformer Models for sEMG

Nicholas Mehlman

Jean-Christophe Gagnon-Audet

221

29 Jul 2025

Learning Critically: Selective Self Distillation in Federated Learning on Non-IID DataIEEE Transactions on Big Data (IEEE Trans. Big Data), 2024

496

20 Apr 2025

Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models

...

428

18 Apr 2025

A Transformer-in-Transformer Network Utilizing Knowledge Distillation for Image Recognition

207

24 Feb 2025

CURing Large Models: Compression via CUR Decomposition

Sanghyeon Park

Soo-Mook Moon

402

08 Jan 2025

On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance

467

01 Nov 2024

Knowledge Distillation in Federated Learning: a Survey on Long Lasting Challenges and New Solutions

Laiqiao Qin

Tianqing Zhu

Wanlei Zhou

Philip S. Yu

269

16 Jun 2024

Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch

Wen-Shu Fan

Xin-Chun Li

Bowen Tao

414

21 May 2024

Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies

Seyed soroush Karimi madahi

341

29 Apr 2024

Retrieval and Distill: A Temporal Data Shift-Free Paradigm for Online Recommendation System

398

24 Apr 2024

CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective

428

22 Apr 2024

Dynamic Temperature Knowledge Distillation

Yukang Wei

Yu Bai

327

19 Apr 2024

Distilling Adversarial Robustness Using Heterogeneous Teachers

Ethan Rathbun

277

23 Feb 2024

Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual InformationInternational Conference on Learning Representations (ICLR), 2024

Linfeng Ye

Shayan Mohajer Hamidi

Renhao Tan

En-Hui Yang

VLM

368

16 Jan 2024

Unraveling Key Factors of Knowledge Distillation

Jingxuan Wei

Linzhuang Sun

Xu Tan

Bihui Yu

Ruifeng Guo

228

14 Dec 2023

Towards the Fundamental Limits of Knowledge Transfer over Finite DomainsInternational Conference on Learning Representations (ICLR), 2023

Qingyue Zhao

Banghua Zhu

528

11 Oct 2023

Can pre-trained models assist in dataset distillation?

Yao Lu

Jianyang Gu

Yang You

306

05 Oct 2023

Data Upcycling Knowledge Distillation for Image Super-Resolution

407

25 Sep 2023

Heterogeneous Generative Knowledge Distillation with Masked Image Modeling

333

18 Sep 2023

Interpretability-Aware Vision Transformer

706

14 Sep 2023

Computation-efficient Deep Learning for Computer Vision: A Survey

Yulin Wang

Gao Huang

354

27 Aug 2023

Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search

449

02 Aug 2023

DOT: A Distillation-Oriented TrainerIEEE International Conference on Computer Vision (ICCV), 2023

Borui Zhao

Quan Cui

Renjie Song

Jiajun Liang

239

17 Jul 2023

On the Impact of Knowledge Distillation for Model InterpretabilityInternational Conference on Machine Learning (ICML), 2023

325

25 May 2023

Towards Understanding and Improving Knowledge Distillation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

403

14 May 2023

Learning From Biased Soft LabelsNeural Information Processing Systems (NeurIPS), 2023

233

16 Feb 2023

Jaccard Metric Losses: Optimizing the Jaccard Index with Soft LabelsNeural Information Processing Systems (NeurIPS), 2023

Zifu Wang

Xuefei Ning

Matthew B. Blaschko

VLM

570

11 Feb 2023

On student-teacher deviations in distillation: does it pay to disobey?Neural Information Processing Systems (NeurIPS), 2023

Srinadh Bhojanapalli

447

30 Jan 2023

Supervision Complexity and its Role in Knowledge DistillationInternational Conference on Learning Representations (ICLR), 2023

357

28 Jan 2023

LEAD: Liberal Feature-based Distillation for Dense RetrievalWeb Search and Data Mining (WSDM), 2022

325

10 Dec 2022

Understanding the Role of Mixup in Knowledge Distillation: An Empirical StudyIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

248

08 Nov 2022

Hard Gate Knowledge Distillation -- Leverage Calibration for Robust and Reliable Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

200

22 Oct 2022

Adaptive Label Smoothing with Self-Knowledge in Natural Language GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Dongkyu Lee

Ka Chun Cheung

Ningyu Zhang

228

22 Oct 2022

Meta-Learning with Self-Improving Momentum TargetNeural Information Processing Systems (NeurIPS), 2022

333

11 Oct 2022

Asymmetric Temperature Scaling Makes Larger Networks Teach Well AgainNeural Information Processing Systems (NeurIPS), 2022

Xin-Chun Li

344

10 Oct 2022

Using Knowledge Distillation to improve interpretable models in a retail banking context

Maxime Biehler

Mohamed Guermazi

Célim Starck

282

30 Sep 2022

Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for ClassificationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

221

18 Aug 2022

Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation

Yi Yang

289

07 Aug 2022

TinyViT: Fast Pretraining Distillation for Small Vision TransformersEuropean Conference on Computer Vision (ECCV), 2022

Lu Yuan

352

456

21 Jul 2022

Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing?International Conference on Machine Learning (ICML), 2022

Keshigeyan Chandrasegaran

Ngoc-Trung Tran

Yunqing Zhao

Ngai-Man Cheung

314

29 Jun 2022

Toward Student-Oriented Teacher Network Training For Knowledge DistillationInternational Conference on Learning Representations (ICLR), 2022

Chengyu Dong

Liyuan Liu

Jingbo Shang

328

14 Jun 2022

The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge DistillationInternational Conference on Learning Representations (ICLR), 2022

Zihui Xue

Zhengqi Gao

Sucheng Ren

Hang Zhao

441

13 Jun 2022

Selective Cross-Task Distillation

Su Lu

Han-Jia Ye

De-Chuan Zhan

292

25 Apr 2022

Localization Distillation for Object DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

Ming-Ming Cheng

280

12 Apr 2022

Better Supervisory Signals by Observing Learning PathsInternational Conference on Learning Representations (ICLR), 2022

Yi Ren

Shangmin Guo

Danica J. Sutherland

170

04 Mar 2022

Bridging the Gap Between Patient-specific and Patient-independent Seizure Prediction via Knowledge DistillationJournal of Neural Engineering (J. Neural Eng.), 2022

220

25 Feb 2022