v1v2v3 (latest)

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

International Conference on Learning Representations (ICLR), 2020

17 December 2020

Papers citing "Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning"

50 / 241 papers shown

Towards Understanding Generalization in DP-GD: A Case Study in Training Two-Layer CNNs

117

27 Nov 2025

Understanding Private Learning From Feature Perspective

196

22 Nov 2025

Balancing Multi-modal Sensor Learning via Multi-objective Optimization

Heshan Devaka Fernando

282

10 Nov 2025

FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts

329

01 Nov 2025

Parameter Averaging in Link Prediction

323

29 Oct 2025

Transforming volcanic monitoring: A dataset and benchmark for onboard volcano activity detection

144

27 Oct 2025

Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity

170

26 Oct 2025

Learning Task-Agnostic Representations through Multi-Teacher Distillation

Philippe Formont

Maxime Darrin

Banafsheh Karimian

Jackie Chi Kit Cheung

222

21 Oct 2025

How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?

257

20 Oct 2025

BPL: Bias-adaptive Preference Distillation Learning for Recommender SystemIEEE Transactions on Knowledge and Data Engineering (TKDE), 2025

151

17 Oct 2025

Revisiting Knowledge Distillation: The Hidden Role of Dataset Size

171

17 Oct 2025

A Functional Perspective on Knowledge Distillation in Neural Networks

Israel Mason-Williams

Gabryel Mason-Williams

Helen Yannakoudakis

205

14 Oct 2025

Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning

Junsoo Oh

Wei Huang

Taiji Suzuki

261

14 Oct 2025

SpikeMatch: Semi-Supervised Learning with Temporal Dynamics of Spiking Neural Networks

149

26 Sep 2025

Enriching Knowledge Distillation with Intra-Class Contrastive Learning

185

26 Sep 2025

Uncertainty-Aware Retinal Vessel Segmentation via Ensemble Distillation

242

15 Sep 2025

Uncovering Scaling Laws for Large Language Models via Inverse Problems

...

Bryan Kian Hsiang Low

LRM

208

09 Sep 2025

Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?

134

13 Aug 2025

Perch 2.0: The Bittern Lesson for Bioacoustics

304

06 Aug 2025

SDD: Self-Degraded Defense against Malicious Fine-tuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

229

27 Jul 2025

Enhancing RAG Efficiency with Adaptive Context Compression

Shuyu Guo

Shuo Zhang

Zhaochun Ren

329

24 Jul 2025

Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime

Yuqing Wang

Shangding Gu

301

30 Jun 2025

Theoretical Modeling of Large Language Model Self-Improvement Training Dynamics Through Solver-Verifier Gap

407

29 Jun 2025

Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods

342

02 Jun 2025

Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models

Dang Nguyen

Jiping Li

Jinghao Zheng

Baharan Mirzasoleiman

DiffM

317

27 May 2025

On the Role of Label Noise in the Feature Learning Process

571

25 May 2025

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation

Muhammad Haseeb Aslam

Clara Martinez

M. Pedersoli

Alessandro Lameiras Koerich

Ali Etemad

Mohammadhadi Shateri

370

19 Apr 2025

Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks

488

11 Apr 2025

Style over Substance: Distilled Language Models Reason Via Stylistic Replication

Philip Lippmann

Jie Yang

LRM

592

02 Apr 2025

Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better

MingWei Zhou

Xiaobing Pei

AAML

956

30 Mar 2025

MMARD: Improving the Min-Max Optimization Process in Adversarial Robustness Distillation

416

09 Mar 2025

Rethinking Spiking Neural Networks from an Ensemble Learning PerspectiveInternational Conference on Learning Representations (ICLR), 2025

299

20 Feb 2025

TimeDistill: Efficient Long-Term Time Series Forecasting with MLP via Cross-Architecture Distillation

353

20 Feb 2025

CR-CTC: Consistency regularization on CTC for improved speech recognitionInternational Conference on Learning Representations (ICLR), 2024

476

17 Feb 2025

sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep StagingInternational Conference on Digital Health (ICDH), 2023

425

28 Jan 2025

Multi-Branch Mutual-Distillation Transformer for EEG-Based Seizure Subtype ClassificationIEEE transactions on neural systems and rehabilitation engineering (IEEE TNSRE), 2024

331

04 Dec 2024

Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level AlignmentIEEE Transactions on Artificial Intelligence (IEEE TAI), 2024

409

03 Nov 2024

TabM: Advancing Tabular Deep Learning with Parameter-Efficient EnsemblingInternational Conference on Learning Representations (ICLR), 2024

1.0K

31 Oct 2024

DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of PlasticityNeural Information Processing Systems (NeurIPS), 2024

364

30 Oct 2024

Where Do Large Learning Rates Lead Us?Neural Information Processing Systems (NeurIPS), 2024

378

29 Oct 2024

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

A. S. Rawat

Veeranjaneyulu Sadhanala

...

Sanjiv Kumar

552

24 Oct 2024

Simplicity Bias via Global Convergence of Sharpness MinimizationInternational Conference on Machine Learning (ICML), 2024

349

21 Oct 2024

Composing Novel Classes: A Concept-Driven Approach to Generalized Category Discovery

725

17 Oct 2024

Towards Understanding Why FixMatch Generalizes Better Than Supervised LearningInternational Conference on Learning Representations (ICLR), 2024

572

15 Oct 2024

Mentor-KD: Making Small Language Models Better Multi-step ReasonersConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Hojae Lee

Junho Kim

SangKeun Lee

LRM

371

11 Oct 2024

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured DataInternational Conference on Learning Representations (ICLR), 2024

Binghui Li

Yuanzhi Li

OOD

440

11 Oct 2024

Features are fate: a theory of transfer learning in high-dimensional regression

Javan Tahir

Surya Ganguli

Grant M. Rotskoff

536

10 Oct 2024

Federated Learning from Vision-Language Foundation Models: Theoretical Analysis and MethodNeural Information Processing Systems (NeurIPS), 2024

368

29 Sep 2024

Effective Pre-Training of Audio Transformers for Sound Event DetectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

351

14 Sep 2024

Practical token pruning for foundation models in few-shot conversational virtual assistant systems

293

21 Aug 2024