ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09816
  4. Cited By
Towards Understanding Ensemble, Knowledge Distillation and
  Self-Distillation in Deep Learning

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

17 December 2020
Zeyuan Allen-Zhu
Yuanzhi Li
    FedML
ArXivPDFHTML

Papers citing "Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning"

50 / 215 papers shown
Title
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation
Muhammad Haseeb Aslam
Clara Martinez
M. Pedersoli
Alessandro Lameiras Koerich
Ali Etemad
Eric Granger
24
0
0
19 Apr 2025
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
Chenyang Zhang
Peifeng Gao
Difan Zou
Yuan Cao
OOD
MLT
59
0
0
11 Apr 2025
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Philip Lippmann
Jie-jin Yang
LRM
46
0
0
02 Apr 2025
Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better
Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better
MingWei Zhou
Xiaobing Pei
AAML
141
0
0
30 Mar 2025
MMARD: Improving the Min-Max Optimization Process in Adversarial Robustness Distillation
Yuzheng Wang
Zhaoyu Chen
Dingkang Yang
Yuanhang Wang
Lizhe Qi
AAML
63
0
0
09 Mar 2025
TimeDistill: Efficient Long-Term Time Series Forecasting with MLP via Cross-Architecture Distillation
TimeDistill: Efficient Long-Term Time Series Forecasting with MLP via Cross-Architecture Distillation
Juntong Ni
Z. Liu
Shiyu Wang
Ming Jin
Wei-dong Jin
AI4TS
36
1
0
24 Feb 2025
Rethinking Spiking Neural Networks from an Ensemble Learning Perspective
Rethinking Spiking Neural Networks from an Ensemble Learning Perspective
Yongqi Ding
Lin Zuo
Mengmeng Jing
Pei He
Hanpu Deng
52
1
0
20 Feb 2025
CR-CTC: Consistency regularization on CTC for improved speech recognition
CR-CTC: Consistency regularization on CTC for improved speech recognition
Zengwei Yao
Wei Kang
Xiaoyu Yang
Fangjun Kuang
Liyong Guo
Han Zhu
Zengrui Jin
Zhaoqing Li
Long Lin
Daniel Povey
53
0
0
17 Feb 2025
sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging
Jingyuan Chen
Yuan Yao
Mie Anderson
Natalie Hauglund
Celia Kjaerby
Verena Untiet
Maiken Nedergaard
Jiebo Luo
41
1
0
28 Jan 2025
Multi-Branch Mutual-Distillation Transformer for EEG-Based Seizure
  Subtype Classification
Multi-Branch Mutual-Distillation Transformer for EEG-Based Seizure Subtype Classification
Ruimin Peng
Zhenbang Du
Changming Zhao
Jingwei Luo
Wenzhong Liu
Xinxing Chen
Dongrui Wu
MedIm
82
8
0
04 Dec 2024
Decoupling Dark Knowledge via Block-wise Logit Distillation for
  Feature-level Alignment
Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level Alignment
Chengting Yu
Fengzhao Zhang
Ruizhe Chen
Zuozhu Liu
Shurun Tan
Er-ping Li
Aili Wang
36
2
0
03 Nov 2024
TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling
TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling
Yury Gorishniy
Akim Kotelnikov
Artem Babenko
LMTD
MoE
89
6
0
31 Oct 2024
DASH: Warm-Starting Neural Network Training in Stationary Settings
  without Loss of Plasticity
DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity
Baekrok Shin
Junsoo Oh
Hanseul Cho
Chulhee Yun
AI4CE
52
1
0
30 Oct 2024
Where Do Large Learning Rates Lead Us?
Where Do Large Learning Rates Lead Us?
Ildus Sadrtdinov
M. Kodryan
Eduard Pokonechny
E. Lobacheva
Dmitry Vetrov
AI4CE
34
0
0
29 Oct 2024
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging
  Small LMs
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A. S. Rawat
Veeranjaneyulu Sadhanala
Afshin Rostamizadeh
Ayan Chakrabarti
Wittawat Jitkrittum
...
Rakesh Shivanna
Sashank J. Reddi
A. Menon
Rohan Anil
Sanjiv Kumar
28
2
0
24 Oct 2024
Simplicity Bias via Global Convergence of Sharpness Minimization
Simplicity Bias via Global Convergence of Sharpness Minimization
Khashayar Gatmiry
Zhiyuan Li
Sashank J. Reddi
Stefanie Jegelka
24
1
0
21 Oct 2024
Composing Novel Classes: A Concept-Driven Approach to Generalized Category Discovery
Composing Novel Classes: A Concept-Driven Approach to Generalized Category Discovery
Chuyu Zhang
Peiyan Gu
Xueyang Yu
Xuming He
28
0
0
17 Oct 2024
Towards Understanding Why FixMatch Generalizes Better Than Supervised Learning
Towards Understanding Why FixMatch Generalizes Better Than Supervised Learning
Jingyang Li
Jiachun Pan
Vincent Y. F. Tan
Kim-Chuan Toh
Pan Zhou
AAML
MLT
45
0
0
15 Oct 2024
Mentor-KD: Making Small Language Models Better Multi-step Reasoners
Mentor-KD: Making Small Language Models Better Multi-step Reasoners
Hojae Lee
Junho Kim
SangKeun Lee
LRM
32
1
0
11 Oct 2024
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
Binghui Li
Yuanzhi Li
OOD
28
2
0
11 Oct 2024
Federated Learning from Vision-Language Foundation Models: Theoretical
  Analysis and Method
Federated Learning from Vision-Language Foundation Models: Theoretical Analysis and Method
Bikang Pan
Wei Huang
Ye-ling Shi
FedML
VLM
34
3
0
29 Sep 2024
Effective Pre-Training of Audio Transformers for Sound Event Detection
Effective Pre-Training of Audio Transformers for Sound Event Detection
Florian Schmid
T. Morocutti
Francesco Foscarin
Jan Schluter
Paul Primus
Gerhard Widmer
ViT
23
2
0
14 Sep 2024
Practical token pruning for foundation models in few-shot conversational
  virtual assistant systems
Practical token pruning for foundation models in few-shot conversational virtual assistant systems
Haode Qi
Cheng Qian
Jian Ni
Pratyush Singh
Reza Fazeli
Gengyu Wang
Zhongzheng Shu
Eric Wayne
Juergen Bross
20
0
0
21 Aug 2024
Multi Teacher Privileged Knowledge Distillation for Multimodal
  Expression Recognition
Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition
Muhammad Haseeb Aslam
M. Pedersoli
Alessandro Lameiras Koerich
Eric Granger
32
1
0
16 Aug 2024
Learn To Learn More Precisely
Learn To Learn More Precisely
Runxi Cheng
Yongxian Wei
Xianglong He
Wanyun Zhu
Songsong Huang
Fei Richard Yu
Fei Ma
Chun Yuan
34
0
0
08 Aug 2024
Tackling Noisy Clients in Federated Learning with End-to-end Label
  Correction
Tackling Noisy Clients in Federated Learning with End-to-end Label Correction
Xuefeng Jiang
Sheng Sun
Jia Li
Jingjing Xue
Runhan Li
Zhiyuan Wu
Gang Xu
Yuwei Wang
Min Liu
FedML
30
10
0
08 Aug 2024
How to Train the Teacher Model for Effective Knowledge Distillation
How to Train the Teacher Model for Effective Knowledge Distillation
Shayan Mohajer Hamidi
Xizhen Deng
Renhao Tan
Linfeng Ye
Ahmed H. Salamah
32
2
0
25 Jul 2024
Understanding the Gains from Repeated Self-Distillation
Understanding the Gains from Repeated Self-Distillation
Divyansh Pareek
Simon S. Du
Sewoong Oh
37
3
0
05 Jul 2024
Self-Cooperation Knowledge Distillation for Novel Class Discovery
Self-Cooperation Knowledge Distillation for Novel Class Discovery
Yuzheng Wang
Zhaoyu Chen
Dingkang Yang
Yunquan Sun
Lizhe Qi
41
2
0
02 Jul 2024
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer
  Analysis
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
Hongkang Li
Meng Wang
Shuai Zhang
Sijia Liu
Pin-Yu Chen
30
6
0
24 Jun 2024
Beyond Model Collapse: Scaling Up with Synthesized Data Requires
  Reinforcement
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement
Yunzhen Feng
Elvis Dohmatob
Pu Yang
Francois Charton
Julia Kempe
50
17
0
11 Jun 2024
Feature contamination: Neural networks learn uncorrelated features and fail to generalize
Feature contamination: Neural networks learn uncorrelated features and fail to generalize
Tianren Zhang
Chujie Zhao
Guanyu Chen
Yizhou Jiang
Feng Chen
OOD
MLT
OODD
77
3
0
05 Jun 2024
What Improves the Generalization of Graph Transformers? A Theoretical
  Dive into the Self-attention and Positional Encoding
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding
Hongkang Li
Meng Wang
Tengfei Ma
Sijia Liu
Zaixi Zhang
Pin-Yu Chen
MLT
AI4CE
42
10
0
04 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
34
3
0
29 May 2024
$\textit{Trans-LoRA}$: towards data-free Transferable Parameter
  Efficient Finetuning
Trans-LoRA\textit{Trans-LoRA}Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning
Runqian Wang
Soumya Ghosh
David D. Cox
Diego Antognini
Aude Oliva
Rogerio Feris
Leonid Karlinsky
32
1
0
27 May 2024
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse
  Mixture-of-Experts
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
Mohammed Nowaz Rabbani Chowdhury
Meng Wang
K. E. Maghraoui
Naigang Wang
Pin-Yu Chen
Christopher Carothers
MoE
36
4
0
26 May 2024
xRAG: Extreme Context Compression for Retrieval-augmented Generation
  with One Token
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token
Xin Cheng
Xun Wang
Xingxing Zhang
Tao Ge
Si-Qing Chen
Furu Wei
Huishuai Zhang
Dongyan Zhao
65
29
0
22 May 2024
E2GNN: Efficient Graph Neural Network Ensembles for Semi-Supervised
  Classification
E2GNN: Efficient Graph Neural Network Ensembles for Semi-Supervised Classification
Xin Zhang
Daochen Zha
Qiaoyu Tan
55
0
0
06 May 2024
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
Wencheng Zhu
Xin Zhou
Pengfei Zhu
Yu Wang
Qinghua Hu
VLM
56
1
0
22 Apr 2024
Breaking the Memory Wall for Heterogeneous Federated Learning with
  Progressive Training
Breaking the Memory Wall for Heterogeneous Federated Learning with Progressive Training
Yebo Wu
Li Li
Chunlin Tian
Chengzhong Xu
FedML
28
13
0
20 Apr 2024
Ensemble Learning for Heterogeneous Large Language Models with Deep
  Parallel Collaboration
Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration
Yi-Chong Huang
Xiaocheng Feng
Baohang Li
Yang Xiang
Hui Wang
Bing Qin
Ting Liu
FedML
38
22
0
19 Apr 2024
Optimized Dynamic Mode Decomposition for Reconstruction and Forecasting
  of Atmospheric Chemistry Data
Optimized Dynamic Mode Decomposition for Reconstruction and Forecasting of Atmospheric Chemistry Data
Meghana Velegar
Christoph Keller
J. Nathan Kutz
23
1
0
13 Apr 2024
Post-Hoc Reversal: Are We Selecting Models Prematurely?
Post-Hoc Reversal: Are We Selecting Models Prematurely?
Rishabh Ranjan
Saurabh Garg
Mrigank Raman
Carlos Guestrin
Zachary Chase Lipton
42
0
0
11 Apr 2024
Scaling Motion Forecasting Models with Ensemble Distillation
Scaling Motion Forecasting Models with Ensemble Distillation
Scott Ettinger
Kratarth Goel
Avikalp Srivastava
Rami Al-Rfou
33
5
0
05 Apr 2024
A Comprehensive Review of Knowledge Distillation in Computer Vision
A Comprehensive Review of Knowledge Distillation in Computer Vision
Sheikh Musa Kaleem
Tufail Rouf
Gousia Habib
Tausifa Jan Saleem
Brejesh Lall
VLM
30
13
0
01 Apr 2024
Diverse Feature Learning by Self-distillation and Reset
Diverse Feature Learning by Self-distillation and Reset
Sejik Park
CLL
37
1
0
29 Mar 2024
DeNetDM: Debiasing by Network Depth Modulation
DeNetDM: Debiasing by Network Depth Modulation
Silpa Vadakkeeveetil Sreelatha
Adarsh Kappiyath
Anjan Dutta
39
2
1
28 Mar 2024
On the Benefits of Over-parameterization for Out-of-Distribution
  Generalization
On the Benefits of Over-parameterization for Out-of-Distribution Generalization
Yifan Hao
Yong Lin
Difan Zou
Tong Zhang
OODD
OOD
42
4
0
26 Mar 2024
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers
Anh-Vu Bui
Vy Vo
Tung Pham
Dinh Q. Phung
Trung Le
FedML
UQCV
29
1
0
19 Mar 2024
Not Just Change the Labels, Learn the Features: Watermarking Deep Neural
  Networks with Multi-View Data
Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data
Yuxuan Li
S. K. Maharana
Yunhui Guo
AAML
38
0
0
15 Mar 2024
12345
Next