ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09816
  4. Cited By
Towards Understanding Ensemble, Knowledge Distillation and
  Self-Distillation in Deep Learning
v1v2v3 (latest)

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

International Conference on Learning Representations (ICLR), 2020
17 December 2020
Zeyuan Allen-Zhu
Yuanzhi Li
    FedML
ArXiv (abs)PDFHTML

Papers citing "Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning"

50 / 241 papers shown
Towards Understanding Generalization in DP-GD: A Case Study in Training Two-Layer CNNs
Towards Understanding Generalization in DP-GD: A Case Study in Training Two-Layer CNNs
Zhongjie Shi
Puyu Wang
Chenyang Zhang
Yuan Cao
117
2
0
27 Nov 2025
Understanding Private Learning From Feature Perspective
Understanding Private Learning From Feature Perspective
Meng Ding
Mingxi Lei
Shaopeng Fu
Shaowei Wang
Di Wang
Jinhui Xu
MLT
196
2
0
22 Nov 2025
Balancing Multi-modal Sensor Learning via Multi-objective Optimization
Balancing Multi-modal Sensor Learning via Multi-objective Optimization
Heshan Devaka Fernando
Parikshit Ram
Yi Zhou
Soham Dan
Horst Samulowitz
Nathalie Baracaldo
Tianyi Chen
282
1
0
10 Nov 2025
FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts
FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts
Weihao Bo
Yanpeng Sun
Y. Wang
X. Zhang
Zechao Li
FedMLVLM
329
0
0
01 Nov 2025
Parameter Averaging in Link Prediction
Parameter Averaging in Link Prediction
Rupesh Sapkota
Caglar Demir
Arnab Sharma
A. Ngomo
MoMeFedML
323
0
0
29 Oct 2025
Transforming volcanic monitoring: A dataset and benchmark for onboard volcano activity detection
Transforming volcanic monitoring: A dataset and benchmark for onboard volcano activity detection
Darshana Priyasad
Tharindu Fernando
Maryam Haghighat
Harshala Gammulle
Clinton Fookes
144
0
0
27 Oct 2025
Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity
Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity
S. Yu
Dongjun Nam
Dina Katabi
Jeany Son
170
0
0
26 Oct 2025
Learning Task-Agnostic Representations through Multi-Teacher Distillation
Learning Task-Agnostic Representations through Multi-Teacher Distillation
Philippe Formont
Maxime Darrin
Banafsheh Karimian
Jackie Chi Kit Cheung
Eric Granger
Ismail Ben Ayed
Mohammadhadi Shateri
Pablo Piantanida
222
2
0
21 Oct 2025
How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?
How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?
Wei Huang
Andi Han
Yujin Song
Yilan Chen
Denny Wu
Difan Zou
Taiji Suzuki
NoLaMLT
257
3
0
20 Oct 2025
BPL: Bias-adaptive Preference Distillation Learning for Recommender System
BPL: Bias-adaptive Preference Distillation Learning for Recommender SystemIEEE Transactions on Knowledge and Data Engineering (TKDE), 2025
SeongKu Kang
Jianxun Lian
Dongha Lee
Wonbin Kweon
Sanghwan Jang
Jaehyun Lee
Jindong Wang
Xing Xie
Hwanjo Yu
151
0
0
17 Oct 2025
Revisiting Knowledge Distillation: The Hidden Role of Dataset Size
Revisiting Knowledge Distillation: The Hidden Role of Dataset Size
Giulia Lanzillotta
Felix Sarnthein
Gil Kur
Thomas Hofmann
Bobby He
171
0
0
17 Oct 2025
A Functional Perspective on Knowledge Distillation in Neural Networks
A Functional Perspective on Knowledge Distillation in Neural Networks
Israel Mason-Williams
Gabryel Mason-Williams
Helen Yannakoudakis
205
0
0
14 Oct 2025
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Junsoo Oh
Wei Huang
Taiji Suzuki
261
1
0
14 Oct 2025
SpikeMatch: Semi-Supervised Learning with Temporal Dynamics of Spiking Neural Networks
SpikeMatch: Semi-Supervised Learning with Temporal Dynamics of Spiking Neural Networks
Jini Yang
Beomseok Oh
Seungryong Kim
S. Kim
149
0
0
26 Sep 2025
Enriching Knowledge Distillation with Intra-Class Contrastive Learning
Enriching Knowledge Distillation with Intra-Class Contrastive Learning
Hua Yuan
Ning Xu
Xin Geng
Yong Rui
185
0
0
26 Sep 2025
Uncertainty-Aware Retinal Vessel Segmentation via Ensemble Distillation
Uncertainty-Aware Retinal Vessel Segmentation via Ensemble Distillation
Jeremiah Fadugba
P. Manescu
Bolanle Oladejo
D. Fernández-Reyes
Philipp Berens
UQCVOODFedML
242
0
0
15 Sep 2025
Uncovering Scaling Laws for Large Language Models via Inverse Problems
Uncovering Scaling Laws for Large Language Models via Inverse Problems
Arun Verma
Zhaoxuan Wu
Zijian Zhou
Xiaoqiang Lin
Zhiliang Chen
...
Zitong Zhao
Xinyi Xu
Apivich Hemachandra
See-Kiong Ng
Bryan Kian Hsiang Low
LRM
208
0
0
09 Sep 2025
Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?
Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?
Viacheslav Barkov
Jonas Schmidinger
Robin Gebbers
Martin Atzmueller
134
2
0
13 Aug 2025
Perch 2.0: The Bittern Lesson for Bioacoustics
Perch 2.0: The Bittern Lesson for Bioacoustics
B. V. Merrienboer
Vincent Dumoulin
Jenny Hamer
Lauren Harrell
Andrea Burns
Tom Denton
MDE
304
13
0
06 Aug 2025
SDD: Self-Degraded Defense against Malicious Fine-tuning
SDD: Self-Degraded Defense against Malicious Fine-tuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
ZiXuan Chen
Weikai Lu
Xin Lin
Ziqian Zeng
AAML
229
8
0
27 Jul 2025
Enhancing RAG Efficiency with Adaptive Context Compression
Enhancing RAG Efficiency with Adaptive Context Compression
Shuyu Guo
Shuo Zhang
Zhaochun Ren
329
0
0
24 Jul 2025
Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime
Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime
Yuqing Wang
Shangding Gu
301
0
0
30 Jun 2025
Theoretical Modeling of Large Language Model Self-Improvement Training Dynamics Through Solver-Verifier Gap
Theoretical Modeling of Large Language Model Self-Improvement Training Dynamics Through Solver-Verifier Gap
Yifan Sun
Yushan Liang
Zhen Zhang
Jiaye Teng
Jiaye Teng
407
0
0
29 Jun 2025
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
Yifan Hao
Xingyuan Pan
Hanning Zhang
Chenlu Ye
Boyao Wang
Tong Zhang
342
2
0
02 Jun 2025
Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models
Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models
Dang Nguyen
Jiping Li
Jinghao Zheng
Baharan Mirzasoleiman
DiffM
317
1
0
27 May 2025
On the Role of Label Noise in the Feature Learning Process
On the Role of Label Noise in the Feature Learning Process
Andi Han
Wei Huang
Zhanpeng Zhou
Gang Niu
Wuyang Chen
Junchi Yan
Akiko Takeda
Taiji Suzuki
NoLaMLT
571
5
0
25 May 2025
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation
Muhammad Haseeb Aslam
Clara Martinez
M. Pedersoli
Alessandro Lameiras Koerich
Ali Etemad
Mohammadhadi Shateri
370
0
0
19 Apr 2025
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
Chenyang Zhang
Peifeng Gao
Difan Zou
Yuan Cao
OODMLT
488
0
0
11 Apr 2025
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Philip Lippmann
Jie Yang
LRM
592
2
0
02 Apr 2025
Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better
Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better
MingWei Zhou
Xiaobing Pei
AAML
956
0
0
30 Mar 2025
MMARD: Improving the Min-Max Optimization Process in Adversarial Robustness Distillation
MMARD: Improving the Min-Max Optimization Process in Adversarial Robustness Distillation
Yuzheng Wang
Zhaoyu Chen
Jinjie Wei
Yuanhang Wang
Lizhe Qi
AAML
416
0
0
09 Mar 2025
Rethinking Spiking Neural Networks from an Ensemble Learning Perspective
Rethinking Spiking Neural Networks from an Ensemble Learning PerspectiveInternational Conference on Learning Representations (ICLR), 2025
Lin Zuo
Yongqi Ding
Mengmeng Jing
Pei He
Hanpu Deng
299
15
0
20 Feb 2025
TimeDistill: Efficient Long-Term Time Series Forecasting with MLP via Cross-Architecture Distillation
TimeDistill: Efficient Long-Term Time Series Forecasting with MLP via Cross-Architecture Distillation
Juntong Ni
Ziqiang Liu
Shiyu Wang
Ming Jin
Wei Jin
AI4TS
353
16
0
20 Feb 2025
CR-CTC: Consistency regularization on CTC for improved speech recognition
CR-CTC: Consistency regularization on CTC for improved speech recognitionInternational Conference on Learning Representations (ICLR), 2024
Zengwei Yao
Wei Kang
Xiaoyu Yang
Fangjun Kuang
Liyong Guo
Han Zhu
Zengrui Jin
Zhaoqing Li
Long Lin
Daniel Povey
476
18
0
17 Feb 2025
sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging
sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep StagingInternational Conference on Digital Health (ICDH), 2023
Jingyuan Chen
Xingtai Lv
Mie Anderson
Natalie Hauglund
Celia Kjaerby
Verena Untiet
Maiken Nedergaard
Jiebo Luo
425
3
0
28 Jan 2025
Multi-Branch Mutual-Distillation Transformer for EEG-Based Seizure
  Subtype Classification
Multi-Branch Mutual-Distillation Transformer for EEG-Based Seizure Subtype ClassificationIEEE transactions on neural systems and rehabilitation engineering (IEEE TNSRE), 2024
Ruimin Peng
Zhenbang Du
Changming Zhao
Jingwei Luo
Wenzhong Liu
Xinxing Chen
Dongrui Wu
MedIm
331
20
0
04 Dec 2024
Decoupling Dark Knowledge via Block-wise Logit Distillation for
  Feature-level Alignment
Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level AlignmentIEEE Transactions on Artificial Intelligence (IEEE TAI), 2024
Chengting Yu
Fengzhao Zhang
Ruizhe Chen
Zuozhu Liu
Shurun Tan
Er-ping Li
Aili Wang
409
7
0
03 Nov 2024
TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling
TabM: Advancing Tabular Deep Learning with Parameter-Efficient EnsemblingInternational Conference on Learning Representations (ICLR), 2024
Yury Gorishniy
Akim Kotelnikov
Artem Babenko
LMTDMoE
1.0K
71
0
31 Oct 2024
DASH: Warm-Starting Neural Network Training in Stationary Settings
  without Loss of Plasticity
DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of PlasticityNeural Information Processing Systems (NeurIPS), 2024
Baekrok Shin
Junsoo Oh
Hanseul Cho
Chulhee Yun
AI4CE
364
3
0
30 Oct 2024
Where Do Large Learning Rates Lead Us?
Where Do Large Learning Rates Lead Us?Neural Information Processing Systems (NeurIPS), 2024
Ildus Sadrtdinov
M. Kodryan
Eduard Pokonechny
E. Lobacheva
Dmitry Vetrov
AI4CE
378
6
0
29 Oct 2024
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging
  Small LMs
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A. S. Rawat
Veeranjaneyulu Sadhanala
Afshin Rostamizadeh
Ayan Chakrabarti
Wittawat Jitkrittum
...
Rakesh Shivanna
Sashank J. Reddi
A. Menon
Rohan Anil
Sanjiv Kumar
552
14
0
24 Oct 2024
Simplicity Bias via Global Convergence of Sharpness Minimization
Simplicity Bias via Global Convergence of Sharpness MinimizationInternational Conference on Machine Learning (ICML), 2024
Khashayar Gatmiry
Zhiyuan Li
Sashank J. Reddi
Stefanie Jegelka
349
3
0
21 Oct 2024
Composing Novel Classes: A Concept-Driven Approach to Generalized Category Discovery
Composing Novel Classes: A Concept-Driven Approach to Generalized Category Discovery
Chuyu Zhang
Peiyan Gu
Xueyang Yu
Xuming He
725
1
0
17 Oct 2024
Towards Understanding Why FixMatch Generalizes Better Than Supervised Learning
Towards Understanding Why FixMatch Generalizes Better Than Supervised LearningInternational Conference on Learning Representations (ICLR), 2024
Jingyang Li
Jiachun Pan
Vincent Y. F. Tan
Kim-Chuan Toh
Pan Zhou
AAMLMLT
572
4
0
15 Oct 2024
Mentor-KD: Making Small Language Models Better Multi-step Reasoners
Mentor-KD: Making Small Language Models Better Multi-step ReasonersConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Hojae Lee
Junho Kim
SangKeun Lee
LRM
371
17
0
11 Oct 2024
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured DataInternational Conference on Learning Representations (ICLR), 2024
Binghui Li
Yuanzhi Li
OOD
440
11
0
11 Oct 2024
Features are fate: a theory of transfer learning in high-dimensional regression
Features are fate: a theory of transfer learning in high-dimensional regression
Javan Tahir
Surya Ganguli
Grant M. Rotskoff
536
7
0
10 Oct 2024
Federated Learning from Vision-Language Foundation Models: Theoretical
  Analysis and Method
Federated Learning from Vision-Language Foundation Models: Theoretical Analysis and MethodNeural Information Processing Systems (NeurIPS), 2024
Bikang Pan
Wei Huang
Ye-ling Shi
FedMLVLM
368
19
0
29 Sep 2024
Effective Pre-Training of Audio Transformers for Sound Event Detection
Effective Pre-Training of Audio Transformers for Sound Event DetectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Florian Schmid
T. Morocutti
Francesco Foscarin
Jan Schluter
Paul Primus
Gerhard Widmer
ViT
351
12
0
14 Sep 2024
Practical token pruning for foundation models in few-shot conversational
  virtual assistant systems
Practical token pruning for foundation models in few-shot conversational virtual assistant systems
Haode Qi
Cheng Qian
Jian Ni
Pratyush Singh
Reza Fazeli
Gengyu Wang
Zhongzheng Shu
Eric Wayne
Juergen Bross
293
0
0
21 Aug 2024
12345
Next
Page 1 of 5