Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.09816
Cited By
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
17 December 2020
Zeyuan Allen-Zhu
Yuanzhi Li
FedML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning"
50 / 215 papers shown
Title
FedPDD: A Privacy-preserving Double Distillation Framework for Cross-silo Federated Recommendation
Sheng Wan
Dashan Gao
Hanlin Gu
Daning Hu
FedML
6
7
0
09 May 2023
Do Not Blindly Imitate the Teacher: Using Perturbed Loss for Knowledge Distillation
Rongzhi Zhang
Jiaming Shen
Tianqi Liu
Jia-Ling Liu
Michael Bendersky
Marc Najork
Chao Zhang
45
18
0
08 May 2023
On Uni-Modal Feature Learning in Supervised Multi-Modal Learning
Chenzhuang Du
Jiaye Teng
Tingle Li
Yichen Liu
Tianyuan Yuan
Yue Wang
Yang Yuan
Hang Zhao
78
38
0
02 May 2023
Certifying Ensembles: A General Certification Theory with S-Lipschitzness
Aleksandar Petrov
Francisco Eiras
Amartya Sanyal
Philip H. S. Torr
Adel Bibi
UQCV
32
1
0
25 Apr 2023
Expand-and-Cluster: Parameter Recovery of Neural Networks
Flavio Martinelli
Berfin Simsek
W. Gerstner
Johanni Brea
26
4
0
25 Apr 2023
Bayesian Optimization Meets Self-Distillation
HyunJae Lee
Heon Song
Hyeonsoo Lee
Gi-hyeon Lee
Suyeong Park
Donggeun Yoo
UQCV
BDL
21
1
0
25 Apr 2023
Self-Distillation for Gaussian Process Regression and Classification
Kenneth Borup
L. Andersen
11
2
0
05 Apr 2023
Domain Generalization for Crop Segmentation with Standardized Ensemble Knowledge Distillation
Simone Angarano
Mauro Martini
Alessandro Navone
Marcello Chiaberge
24
2
0
03 Apr 2023
Per-Example Gradient Regularization Improves Learning Signals from Noisy Data
Xuran Meng
Yuan Cao
Difan Zou
25
5
0
31 Mar 2023
Towards Understanding the Effect of Pretraining Label Granularity
Guanzhe Hong
Yin Cui
Ariel Fuxman
Stanley H. Chan
Enming Luo
19
2
0
29 Mar 2023
Knowledge Distillation for Efficient Sequences of Training Runs
Xingyu Liu
A. Leonardi
Lu Yu
Chris Gilmer-Hill
Matthew L. Leavitt
Jonathan Frankle
11
4
0
11 Mar 2023
Benign Overfitting for Two-layer ReLU Convolutional Neural Networks
Yiwen Kou
Zi-Yuan Chen
Yuanzhou Chen
Quanquan Gu
MLT
49
12
0
07 Mar 2023
Combating Exacerbated Heterogeneity for Robust Models in Federated Learning
Jianing Zhu
Jiangchao Yao
Tongliang Liu
Quanming Yao
Jianliang Xu
Bo Han
FedML
38
5
0
01 Mar 2023
Random Teachers are Good Teachers
Felix Sarnthein
Gregor Bachmann
Sotiris Anagnostidis
Thomas Hofmann
19
4
0
23 Feb 2023
Progressive Ensemble Distillation: Building Ensembles for Efficient Inference
D. Dennis
Abhishek Shetty
A. Sevekari
K. Koishida
Virginia Smith
FedML
27
0
0
20 Feb 2023
Learning From Biased Soft Labels
Hua Yuan
Ning Xu
Yuge Shi
Xin Geng
Yong Rui
FedML
24
6
0
16 Feb 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
M. Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
35
56
0
12 Feb 2023
What Matters In The Structured Pruning of Generative Language Models?
Michael Santacroce
Zixin Wen
Yelong Shen
Yuan-Fang Li
18
32
0
07 Feb 2023
Knowledge Distillation on Graphs: A Survey
Yijun Tian
Shichao Pei
Xiangliang Zhang
Chuxu Zhang
Nitesh V. Chawla
15
28
0
01 Feb 2023
On student-teacher deviations in distillation: does it pay to disobey?
Vaishnavh Nagarajan
A. Menon
Srinadh Bhojanapalli
H. Mobahi
Surinder Kumar
41
9
0
30 Jan 2023
Towards Inference Efficient Deep Ensemble Learning
Ziyue Li
Kan Ren
Yifan Yang
Xinyang Jiang
Yuqing Yang
Dongsheng Li
BDL
21
12
0
29 Jan 2023
Supervision Complexity and its Role in Knowledge Distillation
Hrayr Harutyunyan
A. S. Rawat
A. Menon
Seungyeon Kim
Surinder Kumar
22
12
0
28 Jan 2023
The Power of Linear Combinations: Learning with Random Convolutions
Paul Gavrikov
J. Keuper
29
2
0
26 Jan 2023
Pruning Before Training May Improve Generalization, Provably
Hongru Yang
Yingbin Liang
Xiaojie Guo
Lingfei Wu
Zhangyang Wang
MLT
19
1
0
01 Jan 2023
Enhancing Low-Density EEG-Based Brain-Computer Interfaces with Similarity-Keeping Knowledge Distillation
Xin Huang
Sung-Yu Chen
Chun-Shu Wei
8
0
0
06 Dec 2022
Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed Representations
Linlin Liu
Xingxuan Li
Megh Thakkar
Xin Li
Shafiq R. Joty
Luo Si
Lidong Bing
27
2
0
16 Nov 2022
Instance-aware Model Ensemble With Distillation For Unsupervised Domain Adaptation
Weimin Wu
Jiayuan Fan
Tao Chen
Hancheng Ye
Bo-Wen Zhang
Baopu Li
11
3
0
15 Nov 2022
Robust Few-shot Learning Without Using any Adversarial Samples
Gaurav Kumar Nayak
Ruchit Rawal
Inder Khatri
Anirban Chakraborty
AAML
17
2
0
03 Nov 2022
Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation
Cody Blakeney
Jessica Zosa Forde
Jonathan Frankle
Ziliang Zong
Matthew L. Leavitt
VLM
22
4
0
01 Nov 2022
BEBERT: Efficient and Robust Binary Ensemble BERT
Jiayi Tian
Chao Fang
Hong Wang
Zhongfeng Wang
MQ
32
16
0
28 Oct 2022
Characterizing Datapoints via Second-Split Forgetting
Pratyush Maini
Saurabh Garg
Zachary Chase Lipton
J. Zico Kolter
23
34
0
26 Oct 2022
Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup
Muthuraman Chidambaram
Xiang Wang
Chenwei Wu
Rong Ge
MLT
4
7
0
24 Oct 2022
Variant Parallelism: Lightweight Deep Convolutional Models for Distributed Inference on IoT Devices
Navidreza Asadi
M. Goudarzi
OODD
VLM
18
1
0
15 Oct 2022
Vision Transformers provably learn spatial structure
Samy Jelassi
Michael E. Sander
Yuan-Fang Li
ViT
MLT
32
73
0
13 Oct 2022
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Haw-Shiuan Chang
Ruei-Yao Sun
Kathryn Ricci
Andrew McCallum
41
14
0
10 Oct 2022
The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective
Chi-Heng Lin
Chiraag Kaushik
Eva L. Dyer
Vidya Muthukumar
21
26
0
10 Oct 2022
Dissecting adaptive methods in GANs
Samy Jelassi
David Dobre
A. Mensch
Yuanzhi Li
Gauthier Gidel
11
4
0
09 Oct 2022
Plateau in Monotonic Linear Interpolation -- A "Biased" View of Loss Landscape for Deep Networks
Xiang Wang
Annie Wang
Mo Zhou
Rong Ge
MoMe
158
10
0
03 Oct 2022
Beyond Heart Murmur Detection: Automatic Murmur Grading from Phonocardiogram
A. Elola
E. Aramendi
J. Oliveira
F. Renna
M. Coimbra
Matthew A. Reyna
Reza Sameni
Gari D. Clifford
Ali Bahrami Rad
33
11
0
27 Sep 2022
On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models
Rohan Anil
S. Gadanho
Danya Huang
Nijith Jacob
Zhuoshu Li
...
Cristina Pop
Kevin Regan
G. Shamir
Rakesh Shivanna
Qiqi Yan
3DV
16
41
0
12 Sep 2022
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot Classification
Yunqing Zhao
Ngai-man Cheung
BDL
21
12
0
23 Aug 2022
Towards Understanding Mixture of Experts in Deep Learning
Zixiang Chen
Yihe Deng
Yue-bo Wu
Quanquan Gu
Yuan-Fang Li
MLT
MoE
27
53
0
04 Aug 2022
Efficient One Pass Self-distillation with Zipf's Label Smoothing
Jiajun Liang
Linze Li
Z. Bing
Borui Zhao
Yao Tang
Bo Lin
Haoqiang Fan
12
18
0
26 Jul 2022
Towards understanding how momentum improves generalization in deep learning
Samy Jelassi
Yuanzhi Li
ODL
MLT
AI4CE
11
30
0
13 Jul 2022
Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning
Damien Teney
Maxime Peyrard
Ehsan Abbasnejad
32
29
0
06 Jul 2022
Informed Learning by Wide Neural Networks: Convergence, Generalization and Sampling Complexity
Jianyi Yang
Shaolei Ren
24
3
0
02 Jul 2022
Ensembling over Classifiers: a Bias-Variance Perspective
Neha Gupta
Jamie Smith
Ben Adlam
Zelda E. Mariet
FedML
UQCV
FaML
13
6
0
21 Jun 2022
Revisiting Self-Distillation
M. Pham
Minsu Cho
Ameya Joshi
C. Hegde
12
22
0
17 Jun 2022
Toward Student-Oriented Teacher Network Training For Knowledge Distillation
Chengyu Dong
Liyuan Liu
Jingbo Shang
27
6
0
14 Jun 2022
Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks
Jia-Yu Pan
Pan Zhou
Shuicheng Yan
SSL
26
15
0
08 Jun 2022
Previous
1
2
3
4
5
Next