ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.05945
  4. Cited By
Does Knowledge Distillation Really Work?

Does Knowledge Distillation Really Work?

10 June 2021
Samuel Stanton
Pavel Izmailov
Polina Kirichenko
Alexander A. Alemi
A. Wilson
    FedML
ArXivPDFHTML

Papers citing "Does Knowledge Distillation Really Work?"

36 / 36 papers shown
Title
RM-R1: Reward Modeling as Reasoning
RM-R1: Reward Modeling as Reasoning
X. Chen
Gaotang Li
Z. Wang
Bowen Jin
Cheng Qian
...
Y. Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
150
0
0
05 May 2025
Soft-Label Caching and Sharpening for Communication-Efficient Federated Distillation
Soft-Label Caching and Sharpening for Communication-Efficient Federated Distillation
Kitsuya Azuma
Takayuki Nishio
Yuichi Kitagawa
Wakako Nakano
Takahito Tanimura
FedML
70
0
0
28 Apr 2025
Provable Weak-to-Strong Generalization via Benign Overfitting
Provable Weak-to-Strong Generalization via Benign Overfitting
David X. Wu
A. Sahai
65
6
0
06 Oct 2024
Linear Projections of Teacher Embeddings for Few-Class Distillation
Linear Projections of Teacher Embeddings for Few-Class Distillation
Noel Loo
Fotis Iliopoulos
Wei Hu
Erik Vee
25
0
0
30 Sep 2024
On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion
On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion
Chenghao Fan
Zhenyi Lu
Wei Wei
Jie Tian
Xiaoye Qu
Dangyang Chen
Yu Cheng
MoMe
48
5
0
17 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
43
2
0
12 Jun 2024
Theoretical Analysis of Weak-to-Strong Generalization
Theoretical Analysis of Weak-to-Strong Generalization
Hunter Lang
David Sontag
Aravindan Vijayaraghavan
25
19
0
25 May 2024
GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost
GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost
Xinyi Shang
Peng Sun
Tao Lin
50
2
0
23 May 2024
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
Wencheng Zhu
Xin Zhou
Pengfei Zhu
Yu Wang
Qinghua Hu
VLM
56
1
0
22 Apr 2024
Revisiting Knowledge Distillation under Distribution Shift
Revisiting Knowledge Distillation under Distribution Shift
Songming Zhang
Ziyu Lyu
Xiaofeng Chen
29
1
0
25 Dec 2023
Dynamic Corrective Self-Distillation for Better Fine-Tuning of
  Pretrained Models
Dynamic Corrective Self-Distillation for Better Fine-Tuning of Pretrained Models
Ibtihel Amara
Vinija Jain
Aman Chadha
32
0
0
12 Dec 2023
Knowledge Distillation for Anomaly Detection
Knowledge Distillation for Anomaly Detection
Adrian Alan Pol
E. Govorkova
Sonja Grönroos
N. Chernyavskaya
Philip C. Harris
M. Pierini
I. Ojalvo
P. Elmer
17
1
0
09 Oct 2023
Towards Comparable Knowledge Distillation in Semantic Image Segmentation
Towards Comparable Knowledge Distillation in Semantic Image Segmentation
Onno Niemann
Christopher Vox
Thorben Werner
VLM
19
1
0
07 Sep 2023
Teacher-Student Architecture for Knowledge Distillation: A Survey
Teacher-Student Architecture for Knowledge Distillation: A Survey
Chengming Hu
Xuan Li
Danyang Liu
Haolun Wu
Xi Chen
Ju Wang
Xue Liu
21
16
0
08 Aug 2023
Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML
Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML
M. Deutel
G. Kontes
Christopher Mutschler
Jürgen Teich
43
0
0
23 May 2023
Student-friendly Knowledge Distillation
Student-friendly Knowledge Distillation
Mengyang Yuan
Bo Lang
Fengnan Quan
18
17
0
18 May 2023
Tailoring Instructions to Student's Learning Levels Boosts Knowledge
  Distillation
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
Yuxin Ren
Zi-Qi Zhong
Xingjian Shi
Yi Zhu
Chun Yuan
Mu Li
21
7
0
16 May 2023
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Max Klabunde
Tobias Schumacher
M. Strohmaier
Florian Lemmerich
52
64
0
10 May 2023
Do Not Blindly Imitate the Teacher: Using Perturbed Loss for Knowledge
  Distillation
Do Not Blindly Imitate the Teacher: Using Perturbed Loss for Knowledge Distillation
Rongzhi Zhang
Jiaming Shen
Tianqi Liu
Jia-Ling Liu
Michael Bendersky
Marc Najork
Chao Zhang
45
18
0
08 May 2023
HOICLIP: Efficient Knowledge Transfer for HOI Detection with
  Vision-Language Models
HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Sha Ning
Longtian Qiu
Yongfei Liu
Xuming He
VLM
21
41
0
28 Mar 2023
Knowledge Distillation from Single to Multi Labels: an Empirical Study
Knowledge Distillation from Single to Multi Labels: an Empirical Study
Youcai Zhang
Yuzhuo Qin
Heng-Ye Liu
Yanhao Zhang
Yaqian Li
X. Gu
VLM
51
2
0
15 Mar 2023
Distilling Calibrated Student from an Uncalibrated Teacher
Distilling Calibrated Student from an Uncalibrated Teacher
Ishan Mishra
Sethu Vamsi Krishna
Deepak Mishra
FedML
32
2
0
22 Feb 2023
Improved Knowledge Distillation for Pre-trained Language Models via
  Knowledge Selection
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection
Chenglong Wang
Yi Lu
Yongyu Mu
Yimin Hu
Tong Xiao
Jingbo Zhu
32
8
0
01 Feb 2023
Supervision Complexity and its Role in Knowledge Distillation
Supervision Complexity and its Role in Knowledge Distillation
Hrayr Harutyunyan
A. S. Rawat
A. Menon
Seungyeon Kim
Surinder Kumar
22
12
0
28 Jan 2023
BD-KD: Balancing the Divergences for Online Knowledge Distillation
BD-KD: Balancing the Divergences for Online Knowledge Distillation
Ibtihel Amara
N. Sepahvand
B. Meyer
W. Gross
J. Clark
24
2
0
25 Dec 2022
Join the High Accuracy Club on ImageNet with A Binary Neural Network
  Ticket
Join the High Accuracy Club on ImageNet with A Binary Neural Network Ticket
Nianhui Guo
Joseph Bethge
Christoph Meinel
Haojin Yang
MQ
28
19
0
23 Nov 2022
VeLO: Training Versatile Learned Optimizers by Scaling Up
VeLO: Training Versatile Learned Optimizers by Scaling Up
Luke Metz
James Harrison
C. Freeman
Amil Merchant
Lucas Beyer
...
Naman Agrawal
Ben Poole
Igor Mordatch
Adam Roberts
Jascha Narain Sohl-Dickstein
26
60
0
17 Nov 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
Dynamic Data-Free Knowledge Distillation by Easy-to-Hard Learning
  Strategy
Dynamic Data-Free Knowledge Distillation by Easy-to-Hard Learning Strategy
Jingru Li
Sheng Zhou
Liangcheng Li
Haishuai Wang
Zhi Yu
Jiajun Bu
21
14
0
29 Aug 2022
PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model
  Adaptation
PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation
Qihuang Zhong
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
VLM
CLL
29
41
0
22 Aug 2022
Semi-Supervised Learning of Optical Flow by Flow Supervisor
Semi-Supervised Learning of Optical Flow by Flow Supervisor
Woobin Im
Sebin Lee
Sung-eui Yoon
21
11
0
21 Jul 2022
PrUE: Distilling Knowledge from Sparse Teacher Networks
PrUE: Distilling Knowledge from Sparse Teacher Networks
Shaopu Wang
Xiaojun Chen
Mengzhen Kou
Jinqiao Shi
8
2
0
03 Jul 2022
Parameter-Efficient and Student-Friendly Knowledge Distillation
Parameter-Efficient and Student-Friendly Knowledge Distillation
Jun Rao
Xv Meng
Liang Ding
Shuhan Qi
Dacheng Tao
37
46
0
28 May 2022
Fortuitous Forgetting in Connectionist Networks
Fortuitous Forgetting in Connectionist Networks
Hattie Zhou
Ankit Vani
Hugo Larochelle
Aaron Courville
CLL
6
42
0
01 Feb 2022
Adaptive Distillation: Aggregating Knowledge from Multiple Paths for
  Efficient Distillation
Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation
Sumanth Chennupati
Mohammad Mahdi Kamani
Zhongwei Cheng
Lin Chen
19
4
0
19 Oct 2021
Prune Your Model Before Distill It
Prune Your Model Before Distill It
Jinhyuk Park
Albert No
VLM
38
27
0
30 Sep 2021
1