ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.05237
  4. Cited By
Knowledge distillation: A good teacher is patient and consistent

Knowledge distillation: A good teacher is patient and consistent

9 June 2021
Lucas Beyer
Xiaohua Zhai
Amelie Royer
L. Markeeva
Rohan Anil
Alexander Kolesnikov
    VLM
ArXivPDFHTML

Papers citing "Knowledge distillation: A good teacher is patient and consistent"

50 / 203 papers shown
Title
Towards an On-device Agent for Text Rewriting
Towards an On-device Agent for Text Rewriting
Yun Zhu
Yinxiao Liu
Felix Stahlberg
Shankar Kumar
Yu-hui Chen
Liangchen Luo
Lei Shu
Renjie Liu
Jindong Chen
Lei Meng
LLMAG
29
6
0
22 Aug 2023
Revisiting Vision Transformer from the View of Path Ensemble
Revisiting Vision Transformer from the View of Path Ensemble
Shuning Chang
Pichao Wang
Haowen Luo
Fan Wang
Mike Zheng Shou
ViT
27
3
0
12 Aug 2023
Teacher-Student Architecture for Knowledge Distillation: A Survey
Teacher-Student Architecture for Knowledge Distillation: A Survey
Chengming Hu
Xuan Li
Danyang Liu
Haolun Wu
Xi Chen
Ju Wang
Xue Liu
21
16
0
08 Aug 2023
CLIP-KD: An Empirical Study of CLIP Model Distillation
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang
Zhulin An
Libo Huang
Junyu Bi
Xinqiang Yu
Hansheng Yang
Boyu Diao
Yongjun Xu
VLM
21
27
0
24 Jul 2023
A Good Student is Cooperative and Reliable: CNN-Transformer
  Collaborative Learning for Semantic Segmentation
A Good Student is Cooperative and Reliable: CNN-Transformer Collaborative Learning for Semantic Segmentation
Jinjing Zhu
Yuan Luo
Xueye Zheng
Hao Wang
Lin Wang
17
33
0
24 Jul 2023
Multimodal Distillation for Egocentric Action Recognition
Multimodal Distillation for Egocentric Action Recognition
Gorjan Radevski
Dusan Grujicic
Marie-Francine Moens
Matthew Blaschko
Tinne Tuytelaars
EgoV
20
23
0
14 Jul 2023
Deep Transfer Learning for Intelligent Vehicle Perception: a Survey
Deep Transfer Learning for Intelligent Vehicle Perception: a Survey
Xinyi Liu
Jinlong Li
Jin Ma
Huiming Sun
Zhigang Xu
Tianyu Zhang
Hongkai Yu
48
22
0
26 Jun 2023
Heterogeneous Continual Learning
Heterogeneous Continual Learning
Divyam Madaan
Hongxu Yin
Wonmin Byeon
Jan Kautz
Pavlo Molchanov
CLL
29
5
0
14 Jun 2023
Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
Junyuan Hong
Yi Zeng
Shuyang Yu
Lingjuan Lyu
R. Jia
Jiayu Zhou
AAML
11
8
0
04 Jun 2023
Are Large Kernels Better Teachers than Transformers for ConvNets?
Are Large Kernels Better Teachers than Transformers for ConvNets?
Tianjin Huang
Lu Yin
Zhenyu (Allen) Zhang
Lijuan Shen
Meng Fang
Mykola Pechenizkiy
Zhangyang Wang
Shiwei Liu
30
13
0
30 May 2023
Improving Knowledge Distillation via Regularizing Feature Norm and
  Direction
Improving Knowledge Distillation via Regularizing Feature Norm and Direction
Yuzhu Wang
Lechao Cheng
Manni Duan
Yongheng Wang
Zunlei Feng
Shu Kong
29
19
0
26 May 2023
VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from
  Small Scale to Large Scale
VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale
Zhiwei Hao
Jianyuan Guo
Kai Han
Han Hu
Chang Xu
Yunhe Wang
30
16
0
25 May 2023
PURR: Efficiently Editing Language Model Hallucinations by Denoising
  Language Model Corruptions
PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions
Anthony Chen
Panupong Pasupat
Sameer Singh
Hongrae Lee
Kelvin Guu
29
40
0
24 May 2023
HARD: Hard Augmentations for Robust Distillation
HARD: Hard Augmentations for Robust Distillation
Arne F. Nix
Max F. Burg
Fabian H. Sinz
AAML
31
1
0
24 May 2023
AdvFunMatch: When Consistent Teaching Meets Adversarial Robustness
AdvFunMatch: When Consistent Teaching Meets Adversarial Robustness
Ziuhi Wu
Haichang Gao
Bingqian Zhou
Ping Wang
AAML
16
0
0
24 May 2023
Text-To-Concept (and Back) via Cross-Model Alignment
Text-To-Concept (and Back) via Cross-Model Alignment
Mazda Moayeri
Keivan Rezaei
Maziar Sanjabi
S. Feizi
CLIP
31
39
0
10 May 2023
A Survey on the Robustness of Computer Vision Models against Common
  Corruptions
A Survey on the Robustness of Computer Vision Models against Common Corruptions
Shunxin Wang
Raymond N. J. Veldhuis
Christoph Brune
N. Strisciuglio
OOD
VLM
25
11
0
10 May 2023
CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task
  Adaptation
CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation
J. Heo
S. Azizi
A. Fayyazi
Massoud Pedram
23
0
0
08 May 2023
Distilling Step-by-Step! Outperforming Larger Language Models with Less
  Training Data and Smaller Model Sizes
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Lokesh Nagalapatti
Chun-Liang Li
Chih-Kuan Yeh
Hootan Nakhost
Yasuhisa Fujii
Alexander Ratner
Ranjay Krishna
Chen-Yu Lee
Tomas Pfister
ALM
211
499
0
03 May 2023
DeepAqua: Self-Supervised Semantic Segmentation of Wetland Surface Water
  Extent with SAR Images using Knowledge Distillation
DeepAqua: Self-Supervised Semantic Segmentation of Wetland Surface Water Extent with SAR Images using Knowledge Distillation
Francisco J. Peña
Clara Hubinger
A. H. Payberah
F. Jaramillo
23
0
0
02 May 2023
Expand-and-Cluster: Parameter Recovery of Neural Networks
Expand-and-Cluster: Parameter Recovery of Neural Networks
Flavio Martinelli
Berfin Simsek
W. Gerstner
Johanni Brea
24
4
0
25 Apr 2023
Multi-Class Unlearning for Image Classification via Weight Filtering
Multi-Class Unlearning for Image Classification via Weight Filtering
Samuele Poppi
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
MU
22
5
0
04 Apr 2023
ERM++: An Improved Baseline for Domain Generalization
ERM++: An Improved Baseline for Domain Generalization
Piotr Teterwak
Kuniaki Saito
Theodoros Tsiligkaridis
Kate Saenko
Bryan A. Plummer
OOD
36
9
0
04 Apr 2023
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
Ximeng Sun
Pengchuan Zhang
Peizhao Zhang
Hardik Shah
Kate Saenko
Xide Xia
VLM
18
20
0
31 Mar 2023
Decomposed Cross-modal Distillation for RGB-based Temporal Action
  Detection
Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection
Pilhyeon Lee
Taeoh Kim
Minho Shim
Dongyoon Wee
H. Byun
24
11
0
30 Mar 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
54
325
0
29 Mar 2023
Projected Latent Distillation for Data-Agnostic Consolidation in
  Distributed Continual Learning
Projected Latent Distillation for Data-Agnostic Consolidation in Distributed Continual Learning
Antonio Carta
Andrea Cossu
Vincenzo Lomonaco
D. Bacciu
Joost van de Weijer
FedML
13
5
0
28 Mar 2023
DisWOT: Student Architecture Search for Distillation WithOut Training
DisWOT: Student Architecture Search for Distillation WithOut Training
Peijie Dong
Lujun Li
Zimian Wei
35
56
0
28 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
38
259
0
20 Mar 2023
Understanding the Role of the Projector in Knowledge Distillation
Understanding the Role of the Projector in Knowledge Distillation
Roy Miles
K. Mikolajczyk
19
21
0
20 Mar 2023
Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness
  with Dataset Reinforcement
Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement
Fartash Faghri
Hadi Pouransari
Sachin Mehta
Mehrdad Farajtabar
Ali Farhadi
Mohammad Rastegari
Oncel Tuzel
35
9
0
15 Mar 2023
Three Guidelines You Should Know for Universally Slimmable
  Self-Supervised Learning
Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning
Yunhao Cao
Peiqin Sun
Shuchang Zhou
14
4
0
13 Mar 2023
The Lie-Group Bayesian Learning Rule
The Lie-Group Bayesian Learning Rule
E. M. Kıral
Thomas Möllenhoff
Mohammad Emtiyaz Khan
BDL
15
2
0
08 Mar 2023
Training-Free Acceleration of ViTs with Delayed Spatial Merging
Training-Free Acceleration of ViTs with Delayed Spatial Merging
J. Heo
Seyedarmin Azizi
A. Fayyazi
Massoud Pedram
36
3
0
04 Mar 2023
Boosting Adversarial Transferability using Dynamic Cues
Boosting Adversarial Transferability using Dynamic Cues
Muzammal Naseer
Ahmad A Mahmood
Salman Khan
F. Khan
AAML
20
5
0
23 Feb 2023
Random Teachers are Good Teachers
Random Teachers are Good Teachers
Felix Sarnthein
Gregor Bachmann
Sotiris Anagnostidis
Thomas Hofmann
19
4
0
23 Feb 2023
Detecting software vulnerabilities using Language Models
Detecting software vulnerabilities using Language Models
Marwan Omar
24
11
0
23 Feb 2023
Distilling Calibrated Student from an Uncalibrated Teacher
Distilling Calibrated Student from an Uncalibrated Teacher
Ishan Mishra
Sethu Vamsi Krishna
Deepak Mishra
FedML
32
2
0
22 Feb 2023
Scaling Vision Transformers to 22 Billion Parameters
Scaling Vision Transformers to 22 Billion Parameters
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
61
569
0
10 Feb 2023
Knowledge Distillation in Vision Transformers: A Critical Review
Knowledge Distillation in Vision Transformers: A Critical Review
Gousia Habib
Tausifa Jan Saleem
Brejesh Lall
21
15
0
04 Feb 2023
Understanding Self-Distillation in the Presence of Label Noise
Understanding Self-Distillation in the Presence of Label Noise
Rudrajit Das
Sujay Sanghavi
33
13
0
30 Jan 2023
On student-teacher deviations in distillation: does it pay to disobey?
On student-teacher deviations in distillation: does it pay to disobey?
Vaishnavh Nagarajan
A. Menon
Srinadh Bhojanapalli
H. Mobahi
Surinder Kumar
41
9
0
30 Jan 2023
Supervision Complexity and its Role in Knowledge Distillation
Supervision Complexity and its Role in Knowledge Distillation
Hrayr Harutyunyan
A. S. Rawat
A. Menon
Seungyeon Kim
Surinder Kumar
22
12
0
28 Jan 2023
Improving Text-based Early Prediction by Distillation from Privileged
  Time-Series Text
Improving Text-based Early Prediction by Distillation from Privileged Time-Series Text
Jinghui Liu
Daniel Capurro
Anthony N. Nguyen
Karin Verspoor
AI4TS
19
3
0
26 Jan 2023
A Simple Recipe for Competitive Low-compute Self supervised Vision
  Models
A Simple Recipe for Competitive Low-compute Self supervised Vision Models
Quentin Duval
Ishan Misra
Nicolas Ballas
29
9
0
23 Jan 2023
Effective Decision Boundary Learning for Class Incremental Learning
Effective Decision Boundary Learning for Class Incremental Learning
Chaoyue Ding
Kunchi Li
Jun Wan
Shan Yu
CLL
12
1
0
12 Jan 2023
Transferring Pre-trained Multimodal Representations with Cross-modal
  Similarity Matching
Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching
Byoungjip Kim
Sun Choi
Dasol Hwang
Moontae Lee
Honglak Lee
25
10
0
07 Jan 2023
NeRN -- Learning Neural Representations for Neural Networks
NeRN -- Learning Neural Representations for Neural Networks
Maor Ashkenazi
Zohar Rimon
Ron Vainshtein
Shir Levi
Elad Richardson
Pinchas Mintz
Eran Treister
3DH
22
13
0
27 Dec 2022
Joint Embedding of 2D and 3D Networks for Medical Image Anomaly
  Detection
Joint Embedding of 2D and 3D Networks for Medical Image Anomaly Detection
In-Joo Kang
Jinah Park
3DH
11
1
0
21 Dec 2022
FlexiViT: One Model for All Patch Sizes
FlexiViT: One Model for All Patch Sizes
Lucas Beyer
Pavel Izmailov
Alexander Kolesnikov
Mathilde Caron
Simon Kornblith
Xiaohua Zhai
Matthias Minderer
Michael Tschannen
Ibrahim M. Alabdulmohsin
Filip Pavetić
VLM
40
89
0
15 Dec 2022
Previous
12345
Next