ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09816
  4. Cited By
Towards Understanding Ensemble, Knowledge Distillation and
  Self-Distillation in Deep Learning

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

17 December 2020
Zeyuan Allen-Zhu
Yuanzhi Li
    FedML
ArXivPDFHTML

Papers citing "Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning"

50 / 215 papers shown
Title
FedPDD: A Privacy-preserving Double Distillation Framework for
  Cross-silo Federated Recommendation
FedPDD: A Privacy-preserving Double Distillation Framework for Cross-silo Federated Recommendation
Sheng Wan
Dashan Gao
Hanlin Gu
Daning Hu
FedML
6
7
0
09 May 2023
Do Not Blindly Imitate the Teacher: Using Perturbed Loss for Knowledge
  Distillation
Do Not Blindly Imitate the Teacher: Using Perturbed Loss for Knowledge Distillation
Rongzhi Zhang
Jiaming Shen
Tianqi Liu
Jia-Ling Liu
Michael Bendersky
Marc Najork
Chao Zhang
45
18
0
08 May 2023
On Uni-Modal Feature Learning in Supervised Multi-Modal Learning
On Uni-Modal Feature Learning in Supervised Multi-Modal Learning
Chenzhuang Du
Jiaye Teng
Tingle Li
Yichen Liu
Tianyuan Yuan
Yue Wang
Yang Yuan
Hang Zhao
78
38
0
02 May 2023
Certifying Ensembles: A General Certification Theory with
  S-Lipschitzness
Certifying Ensembles: A General Certification Theory with S-Lipschitzness
Aleksandar Petrov
Francisco Eiras
Amartya Sanyal
Philip H. S. Torr
Adel Bibi
UQCV
32
1
0
25 Apr 2023
Expand-and-Cluster: Parameter Recovery of Neural Networks
Expand-and-Cluster: Parameter Recovery of Neural Networks
Flavio Martinelli
Berfin Simsek
W. Gerstner
Johanni Brea
26
4
0
25 Apr 2023
Bayesian Optimization Meets Self-Distillation
Bayesian Optimization Meets Self-Distillation
HyunJae Lee
Heon Song
Hyeonsoo Lee
Gi-hyeon Lee
Suyeong Park
Donggeun Yoo
UQCV
BDL
21
1
0
25 Apr 2023
Self-Distillation for Gaussian Process Regression and Classification
Self-Distillation for Gaussian Process Regression and Classification
Kenneth Borup
L. Andersen
11
2
0
05 Apr 2023
Domain Generalization for Crop Segmentation with Standardized Ensemble
  Knowledge Distillation
Domain Generalization for Crop Segmentation with Standardized Ensemble Knowledge Distillation
Simone Angarano
Mauro Martini
Alessandro Navone
Marcello Chiaberge
24
2
0
03 Apr 2023
Per-Example Gradient Regularization Improves Learning Signals from Noisy
  Data
Per-Example Gradient Regularization Improves Learning Signals from Noisy Data
Xuran Meng
Yuan Cao
Difan Zou
25
5
0
31 Mar 2023
Towards Understanding the Effect of Pretraining Label Granularity
Towards Understanding the Effect of Pretraining Label Granularity
Guanzhe Hong
Yin Cui
Ariel Fuxman
Stanley H. Chan
Enming Luo
19
2
0
29 Mar 2023
Knowledge Distillation for Efficient Sequences of Training Runs
Knowledge Distillation for Efficient Sequences of Training Runs
Xingyu Liu
A. Leonardi
Lu Yu
Chris Gilmer-Hill
Matthew L. Leavitt
Jonathan Frankle
11
4
0
11 Mar 2023
Benign Overfitting for Two-layer ReLU Convolutional Neural Networks
Benign Overfitting for Two-layer ReLU Convolutional Neural Networks
Yiwen Kou
Zi-Yuan Chen
Yuanzhou Chen
Quanquan Gu
MLT
49
12
0
07 Mar 2023
Combating Exacerbated Heterogeneity for Robust Models in Federated
  Learning
Combating Exacerbated Heterogeneity for Robust Models in Federated Learning
Jianing Zhu
Jiangchao Yao
Tongliang Liu
Quanming Yao
Jianliang Xu
Bo Han
FedML
38
5
0
01 Mar 2023
Random Teachers are Good Teachers
Random Teachers are Good Teachers
Felix Sarnthein
Gregor Bachmann
Sotiris Anagnostidis
Thomas Hofmann
19
4
0
23 Feb 2023
Progressive Ensemble Distillation: Building Ensembles for Efficient
  Inference
Progressive Ensemble Distillation: Building Ensembles for Efficient Inference
D. Dennis
Abhishek Shetty
A. Sevekari
K. Koishida
Virginia Smith
FedML
27
0
0
20 Feb 2023
Learning From Biased Soft Labels
Learning From Biased Soft Labels
Hua Yuan
Ning Xu
Yuge Shi
Xin Geng
Yong Rui
FedML
24
6
0
16 Feb 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning,
  Generalization, and Sample Complexity
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
M. Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
35
56
0
12 Feb 2023
What Matters In The Structured Pruning of Generative Language Models?
What Matters In The Structured Pruning of Generative Language Models?
Michael Santacroce
Zixin Wen
Yelong Shen
Yuan-Fang Li
18
32
0
07 Feb 2023
Knowledge Distillation on Graphs: A Survey
Knowledge Distillation on Graphs: A Survey
Yijun Tian
Shichao Pei
Xiangliang Zhang
Chuxu Zhang
Nitesh V. Chawla
15
28
0
01 Feb 2023
On student-teacher deviations in distillation: does it pay to disobey?
On student-teacher deviations in distillation: does it pay to disobey?
Vaishnavh Nagarajan
A. Menon
Srinadh Bhojanapalli
H. Mobahi
Surinder Kumar
41
9
0
30 Jan 2023
Towards Inference Efficient Deep Ensemble Learning
Towards Inference Efficient Deep Ensemble Learning
Ziyue Li
Kan Ren
Yifan Yang
Xinyang Jiang
Yuqing Yang
Dongsheng Li
BDL
21
12
0
29 Jan 2023
Supervision Complexity and its Role in Knowledge Distillation
Supervision Complexity and its Role in Knowledge Distillation
Hrayr Harutyunyan
A. S. Rawat
A. Menon
Seungyeon Kim
Surinder Kumar
22
12
0
28 Jan 2023
The Power of Linear Combinations: Learning with Random Convolutions
The Power of Linear Combinations: Learning with Random Convolutions
Paul Gavrikov
J. Keuper
29
2
0
26 Jan 2023
Pruning Before Training May Improve Generalization, Provably
Pruning Before Training May Improve Generalization, Provably
Hongru Yang
Yingbin Liang
Xiaojie Guo
Lingfei Wu
Zhangyang Wang
MLT
19
1
0
01 Jan 2023
Enhancing Low-Density EEG-Based Brain-Computer Interfaces with
  Similarity-Keeping Knowledge Distillation
Enhancing Low-Density EEG-Based Brain-Computer Interfaces with Similarity-Keeping Knowledge Distillation
Xin Huang
Sung-Yu Chen
Chun-Shu Wei
8
0
0
06 Dec 2022
Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed
  Representations
Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed Representations
Linlin Liu
Xingxuan Li
Megh Thakkar
Xin Li
Shafiq R. Joty
Luo Si
Lidong Bing
27
2
0
16 Nov 2022
Instance-aware Model Ensemble With Distillation For Unsupervised Domain
  Adaptation
Instance-aware Model Ensemble With Distillation For Unsupervised Domain Adaptation
Weimin Wu
Jiayuan Fan
Tao Chen
Hancheng Ye
Bo-Wen Zhang
Baopu Li
11
3
0
15 Nov 2022
Robust Few-shot Learning Without Using any Adversarial Samples
Robust Few-shot Learning Without Using any Adversarial Samples
Gaurav Kumar Nayak
Ruchit Rawal
Inder Khatri
Anirban Chakraborty
AAML
17
2
0
03 Nov 2022
Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation
Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation
Cody Blakeney
Jessica Zosa Forde
Jonathan Frankle
Ziliang Zong
Matthew L. Leavitt
VLM
22
4
0
01 Nov 2022
BEBERT: Efficient and Robust Binary Ensemble BERT
BEBERT: Efficient and Robust Binary Ensemble BERT
Jiayi Tian
Chao Fang
Hong Wang
Zhongfeng Wang
MQ
32
16
0
28 Oct 2022
Characterizing Datapoints via Second-Split Forgetting
Characterizing Datapoints via Second-Split Forgetting
Pratyush Maini
Saurabh Garg
Zachary Chase Lipton
J. Zico Kolter
23
34
0
26 Oct 2022
Provably Learning Diverse Features in Multi-View Data with Midpoint
  Mixup
Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup
Muthuraman Chidambaram
Xiang Wang
Chenwei Wu
Rong Ge
MLT
4
7
0
24 Oct 2022
Variant Parallelism: Lightweight Deep Convolutional Models for
  Distributed Inference on IoT Devices
Variant Parallelism: Lightweight Deep Convolutional Models for Distributed Inference on IoT Devices
Navidreza Asadi
M. Goudarzi
OODD
VLM
18
1
0
15 Oct 2022
Vision Transformers provably learn spatial structure
Vision Transformers provably learn spatial structure
Samy Jelassi
Michael E. Sander
Yuan-Fang Li
ViT
MLT
32
73
0
13 Oct 2022
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Haw-Shiuan Chang
Ruei-Yao Sun
Kathryn Ricci
Andrew McCallum
41
14
0
10 Oct 2022
The good, the bad and the ugly sides of data augmentation: An implicit
  spectral regularization perspective
The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective
Chi-Heng Lin
Chiraag Kaushik
Eva L. Dyer
Vidya Muthukumar
21
26
0
10 Oct 2022
Dissecting adaptive methods in GANs
Dissecting adaptive methods in GANs
Samy Jelassi
David Dobre
A. Mensch
Yuanzhi Li
Gauthier Gidel
11
4
0
09 Oct 2022
Plateau in Monotonic Linear Interpolation -- A "Biased" View of Loss
  Landscape for Deep Networks
Plateau in Monotonic Linear Interpolation -- A "Biased" View of Loss Landscape for Deep Networks
Xiang Wang
Annie Wang
Mo Zhou
Rong Ge
MoMe
158
10
0
03 Oct 2022
Beyond Heart Murmur Detection: Automatic Murmur Grading from
  Phonocardiogram
Beyond Heart Murmur Detection: Automatic Murmur Grading from Phonocardiogram
A. Elola
E. Aramendi
J. Oliveira
F. Renna
M. Coimbra
Matthew A. Reyna
Reza Sameni
Gari D. Clifford
Ali Bahrami Rad
33
11
0
27 Sep 2022
On the Factory Floor: ML Engineering for Industrial-Scale Ads
  Recommendation Models
On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models
Rohan Anil
S. Gadanho
Danya Huang
Nijith Jacob
Zhuoshu Li
...
Cristina Pop
Kevin Regan
G. Shamir
Rakesh Shivanna
Qiqi Yan
3DV
16
41
0
12 Sep 2022
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot
  Classification
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot Classification
Yunqing Zhao
Ngai-man Cheung
BDL
21
12
0
23 Aug 2022
Towards Understanding Mixture of Experts in Deep Learning
Towards Understanding Mixture of Experts in Deep Learning
Zixiang Chen
Yihe Deng
Yue-bo Wu
Quanquan Gu
Yuan-Fang Li
MLT
MoE
27
53
0
04 Aug 2022
Efficient One Pass Self-distillation with Zipf's Label Smoothing
Efficient One Pass Self-distillation with Zipf's Label Smoothing
Jiajun Liang
Linze Li
Z. Bing
Borui Zhao
Yao Tang
Bo Lin
Haoqiang Fan
12
18
0
26 Jul 2022
Towards understanding how momentum improves generalization in deep
  learning
Towards understanding how momentum improves generalization in deep learning
Samy Jelassi
Yuanzhi Li
ODL
MLT
AI4CE
11
30
0
13 Jul 2022
Predicting is not Understanding: Recognizing and Addressing
  Underspecification in Machine Learning
Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning
Damien Teney
Maxime Peyrard
Ehsan Abbasnejad
32
29
0
06 Jul 2022
Informed Learning by Wide Neural Networks: Convergence, Generalization
  and Sampling Complexity
Informed Learning by Wide Neural Networks: Convergence, Generalization and Sampling Complexity
Jianyi Yang
Shaolei Ren
24
3
0
02 Jul 2022
Ensembling over Classifiers: a Bias-Variance Perspective
Ensembling over Classifiers: a Bias-Variance Perspective
Neha Gupta
Jamie Smith
Ben Adlam
Zelda E. Mariet
FedML
UQCV
FaML
13
6
0
21 Jun 2022
Revisiting Self-Distillation
Revisiting Self-Distillation
M. Pham
Minsu Cho
Ameya Joshi
C. Hegde
12
22
0
17 Jun 2022
Toward Student-Oriented Teacher Network Training For Knowledge
  Distillation
Toward Student-Oriented Teacher Network Training For Knowledge Distillation
Chengyu Dong
Liyuan Liu
Jingbo Shang
27
6
0
14 Jun 2022
Towards Understanding Why Mask-Reconstruction Pretraining Helps in
  Downstream Tasks
Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks
Jia-Yu Pan
Pan Zhou
Shuicheng Yan
SSL
26
15
0
08 Jun 2022
Previous
12345
Next